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Identities = 50/58 (86%) , Positives = 56/58 (96%) 

Query: 137 LHKIEMCYDYNKQSQAVAHKLGFTLEANIRDREDAQGKRCGDMRFGLLRSEWEKKRR 194 
LHKIELGCYDYNKQSQAVARKLGFTLEAN RDR+D QG+RCGDMRFGLLRSEWE++++ 
5 Sbjct: 1 LHKIELGCYDYMKQSQAVARKLGFTLEANARDRKDVQGRRCGDMRFGLLRSEWEEQKQ 58 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 230 

10 A DNA sequence (GBSx0244) was identified in S.agalactiae <SEQ ID 733> which encodes the amino acid 
sequence <SEQ ID 734>. This protein is predicted to be ribosomal-protein-alanine N-acetyltransferase. 
Analysis of this protein sequence reveals the following: 

Possible site: 54 

>» Seems to have no N-terminal signal sequence 

15 

Final Results 

bacterial cytoplasm --- Certainty=0. 4066 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 

A related GBS nucleic acid sequence <SEQ ID 9599> which encodes amino acid sequence <SEQ ID 9600> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04418 GB:AP001509 ribosomal-protein-alanine 
25 N-acetyltransferase [Bacillus halodurans] 

Identities = 63/185 (34%) , Positives = 95/185 (51%) , Gaps = 11/185 (5%) 

Query: 53 KALPKLETDRLILRQRTVGDVPAMFDYVCLEEVAYPAGLSPIASLEDEYDYFENRYyQNL 112 
K P LET RLILR+ T D +4- Y+ +EV GL P +LED E +Y+++ 

30 Sbjct: 6 KRFP ILETKRL ILRKITTDDARS I LS YLSDKEVMKYFGLEPFQTLEDALG - - EIAWYES I 63 

Query: 113 EKAKLPSGYGITVKGSDRIIGSCAFN HRHEDDVFEICYLLHPDYWGHGYMTEAVA 167 

+ +GIT+KG D +IGSC F+ H + FE+ L YWG G +EA+ 
Sbjct: 64 LHEQTGIRWGITLKGQDEVIGSCGFHQWVPKHHRAEIGFELSKL YWGQGIASEAIR 119 

35 

Query: 168 ALIEVGFTLLNLHKIEIRCYDYNKQSRRVAEICLGFTLEATIRDRKDNQDNRCVNLIYGLL 227 

A+I+ GF L L +1+ N S+R+ EK GF E +R + +Y LL 

Sbjct: 120 AVIQYGFEHLELQRIQALIEPPNIPSQRLVEKQGFIS3GLLRSYEYTCGKFDDLYMYSLL 179 

40 Query: 228 RSEWE 232 

Sbjct: 180 KRDFD 184 

There is also homology to SEQ ID 732: 

45 Identities = 39/54 (72%) , Positives = 44/54 (81%) 

Query: 179 IiHKIEIRCYDYNKQSRRVAEKLGFrLEATIRDRKDNQDNRCvNLIYGLDRSEWE 232 

LHKIE+ CYDYNKQS+ VA KLGFTLEA RDRKD Q RC ++ +GLLRSEWE 
Sbjct: 1 MKIELGCYDYNKQSQAVARKLGFTLEANARDRKDVQGRRCGDMRFGLLRSEWE 54 

50 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 231 

A DNA sequence (GBSx0245) was identified in S.agalactiae <SEQ ID 735> which encodes the amino acid 
sequence <SEQ ID 736>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2719 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 232 

A DNA sequence (GBSx0246) was identified in S.agalactiae <SEQ ID 737> which encodes the amino acid 

sequence <SEQ ID 738>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
20 »> Seems to have no N-temiinal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3250 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9597> which encodes amino acid sequence <SEQ ID 9598> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 73 9> which encodes the amino acid 
sequence <SEQ ID 740>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm --- Certainty=0. 3293 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below: 

Identities = 24/55 (43%) , Positives = 38/55 (68%) 

Query: 56 LLEGLTANKQDVLKE»GLVSLEAFAKVSEADVLALKGIGPAAIKQLVDNC3WFAK 110 
++ G+ ++ 4- L G+ S +AF + +E D+LALKGIGPA +K+LV+NG F K 
45 Sbjct: 77 WAGIRSDLVETIjYAEGIHSAQAFKEOTEKDLI^KGIGPATVKKLVENGASFKK 131 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 233 

A DNA sequence (GBSx0247) was identified in S.agalactiae <SEQ ID 741> which encodes the amino acid 
sequence <SEQ ID 742>. Analysis of this protein sequence reveals the following: 
Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2901 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 743> which encodes the amino acid 
sequence <SEQ ID 744>. Analysis of this protein sequence reveals the following: 
Possible site: 27 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.253S (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 57/84 (67%) , Positives = 73/84 (86%) 

Query: 1 MSYEQEFLKDFEEWLQSQISINQMATOSAKKVIiEEDKDERAADAYIRYESKLDAYRFLQG 60 

MSYE+EFLKDFE+W+++QI +NQ+AM ++++V +ED DERA DA+IRYESKLDAY FL G 
Sbjct: 1 MSYEKEFLKDFEDWVKTQIQWQLAMATSQEVAQEDGDERAKDAFIRYESKLDAYEFLLG 60 

Query: 61 KFNNYHNQKS FHDLPDGLFGQRHY 84 

KF+NY N K+FHD+PD LFG RHY 
Sbjct: 61 KFDNYKNGKAFHDI PDELFGARHY 84 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 234 

A DNA sequence (GBSx0248) was identified in S.agalactiae <SEQ ID 745> which encodes the amino acid 
sequence <SEQ ID 746>. This protein is predicted to be methyltransferase. Analysis of this protein 
sequence reveals the following: 

Possible site: 61 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2469 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 747> which encodes the amino acid 
sequence <SEQ ID 748>. Analysis of this protein sequence reveals the following: 



d N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3352 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) <; suco 

An alignment of the GAS and GBS proteins is shown below: 

5 Identities = 26/60 (43%) , Positives = 37/60 (61%) 

Query: 23 LKNERCPHPKL1NVLERKLEIILGDQKHILEXDSLISLSPQETHHLRAIENSKFLQIELD 82 

+ E P K+I VLE +L L DQK +L ++SLI++ Q+ HHL A + K LQ+ LD 
Sbjct: 42 ISQETSPRDKVILVLEGQLIFDLEDQKQVLTQESLIAIPAQKVHHLEAKTDCKLLQVLDD 101 

10 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 235 

A DNA sequence (GBSx0249) was identified in S.agalactiae <SEQ ID 749> which encodes the amino acid 
15 sequence <SEQ ID 750>. This protein is predicted to be integrase (codV). Analysis of this protein sequence 
reveals the following: 

, Possible site: 59 
»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 3842 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 236 

A DNA sequence (GBSx0250) was identified in S.agalactiae <SEQ ID 751> which encodes the amino acid 
30 sequence <SEQ ID 752>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> May be a lipoprotein 

Final Results 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 752 (GBS128) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 5; MW 15kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 32 (lane 4; 2 bands). 

The GBS128-GST fusion product was purified (Figure 198, lane 2) and used to immunise mice. The 
45 resulting antiserum was used for FACS (Figure 288), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 237 

A DNA sequence (GBSx0251) was identified in S.agalactiae <SEQ ID 753> which encodes the amino acid 
sequence <SEQ ID 754>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2940 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 755> which encodes the amino acid 
sequence <SEQ ID 756>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 2518 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 30/90 (33%), Positives = 49/90 (54%), Gaps = 10/90 (11%) 

Query: 3 TVAWVDDQLKDDATELFQSLGLDMSTAVKMFL1QSVKTQSIPFEIK NKSSV 54 

T+ +RVDD +K A ++ + LG+ MSTA+ MFL Q + T IPF++ N + 

Sbjct: 15 TLNLRVDDSVKSAADDILKRLGIPMSTAIDMFIiNQIILTGGIPFDVSLPEAPQRvNVDYM 74 

Query: 55 SDEEFQNLVETKLKGIRVKASDPESVNAFF 84 

S E+F + + T + K 4-P+ V F+ 
Sbjct: 75 SQEKFYDKLITSFED- -AKTCNPQDVGKFY 102 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 238 

A DNA sequence (GBSx0252) was identified in S.agalactiae <SEQ ID 757> which encodes the amino acid 
sequence <SEQ ID 758>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.81 Transmembrane 370 - 386 ( 368 - 388) 

Final Results 

bacterial membrane Certainty=0 . 2126 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9593> which encodes amino acid sequence <SEQ ID 9594> 
was also identified. A related GBS nucleic acid sequence <SEQ ID 10773> which encodes amino acid 
sequence <SEQ ID 10774> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 75 9> which encodes the amino acid 
sequence <SEQ ID 760>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.57 Transmembrane 354 - 370 ( 353 - 371) 

Final Results 

bacterial membrane Certainty=0 .282S (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 344-348 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 54/277 (23%) , Positives = 99/277 (35%) , Gaps = 31/277 (11%) 

Query: 126 SIGNLPDLPKGTTVAFETPVDTATPGDKPAKVVVTYPDGSKDTvDVTVKVTO 185 

++ +LP +TTEPV +V +D+ + T PA 

Sbjct: 121 AVKDLPASTESTTQPVEAPVQETQASASDSMVTGDSTSVTTDSPEETPSSESPVAPALSE 180 

Query: 186 DPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVAFE1 
PA Q E P S P T A E r . 

Sbjct: 181 APA QPAESEEPSVAASSEETPS - - PSTPAAPE 1 ] 

Query: 246 DTVDVTVKVVDPRTDADKNDPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVAPETPVDT 305 

+T P A + PA ++ T + P P + +TP 

Sbjct: 235 ETPSPET PEEPAAPSQPAESEESSVAATTSPS -PSTPAESET- -QTPPAV 281 

Query: 306 ATPGDKPAKVWTYPDGSKDTVDVTVKVVDPRTDADK NDPAGKDQQVNGK 355 

DKP+ PS + TV+ + +DK N + + + 

Sbjct: 282 TKDSDKPSSAAEK-PAASSLVSEQTVQQPTSKRSSDKKEEQEQSYSPNRSLSRQVRAHES 340 

Query: 356 GNKLPATGENATPFFNWALTIMSSVGLLSVSKKKED 392 

G LP+TGE A P F + +T+MS G L V+K++++ 
Sbjct: 341 GKYLPSTGEKAQPLF- IATMTLMSLFGSLLVTKRQKE 375 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 239 

A DNA sequence (GBSx0253) was identified in S.agalactiae <SEQ ID 761> which encodes the amino acid 
sequence <SEQ ID 762>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0 . 5289 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.pyogenes. 



WO 02/34771 



PCT/GB01/04789 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 240 

A DNA sequence (GBSx0254) was identified in S.agalactiae <SEQ ID 763> which encodes the amino acid 
sequence <SEQ ID 764>. This protein is predicted to be surface protein Rib. Analysis of this protein 
sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 39 - 55 ( 39 - 55) 



- Final Results 

bacterial membrane — 

bacterial outside — 

bacterial cytoplasm — 



Certainty=0 . 142S (Affirmative) • 
Certainty=0 . 0000 (Not Clear) < i 
Certainty=0. 0000 (Not Clear) < I 



A related GBS nucleic acid sequence <SEQ ID 9591> which encodes amino acid sequence <SEQ ID 9592> 
was also identified. 

The protein differs significantly from U58333 in several places: 

Query: 157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED PKWDEGSRDK 201 

T PDG D V+V++ 4- + DK D K KAED P +G+ 

Sbjct: 683 TYPDGSKDTVDVTVKvVDPRTDADKI^^ 742 

Query: 202 VLISLDDIKTDIDNNPK TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 
Sbjct: 743 FETPVDTA-TPGDKPAKVVVTYPDGSKDTVDW--VKvvDPRT-DADKM)PAGKDQQVNV 798 

Query: 157 TKPDGQvDIVNVSLTIYNSSALRDKIDEVKK- - - KAED PKWDEGSRDK 201 

T PDG D V+V++ + + DK D K KAED P +G+ 

Sbjct: 841 TYPDGSKDTVDVTVKVVDPRTDADKtroPAGraDQQVNVGETPKAEDSIGNLPDLPKGTTVA 900 



•TQSD IANKI TEVTNLEKI LVPRI PDADKNDPAGKDQQVNV 258 
T D + +VT K++ PR DADKNDPAGKDQQVNV 
901 FETPVDTA- TPGDKPAKVVVTYPDGSKDTVDVT- -VKVVDPRT-DADKNDPAGKDQQVNV 956 



Query: 202 VLISLDDIKTDIDNNPK- 
+D T D K 

Sbj< 

Query: 157 TKPDGQVDIVNVSLTIYNSSALKDKIDEVKK KAED-- 

T PDG D V+V++ + + DK D K KAED P +G+ 

Sbjct: 288 TYPDGSKDTVDVTVKVVDPRTDADKNDPAGKDQQVNVGETPKAEDSIGNLPDLPKGTTVA 347 

Query: 

Sb j Ct : 



2 VLISLDDIKTDIDNNPK-- 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 

3 FETPVDTA- TPGDKPAKVWTYPDGS KDTVDVT - - VKWDPRT -DADKNDPAGKDQQVNV 403 



157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED-- 

T PDG D V+V++ + + DK D K KAED P +G+ 

604 TYPDGSKDTVDVTVKVVDPRTDADKNDPAGKDQQVIWGETPKAEDSIGNLPDLPKGTTVA 663 



Query: 202 VLISLDDIKTDIDNNPK- - 



- -TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 
+VT K++ PR DADKNDPAGKDQQVNV 
i FETPVDTA-TPGDKPAKVVVTYPDGSKI)TVDOT--VKVVDPRT-DADKNDPAGiCDQQVNV 719 



Query: 157 TKPDGQVDIVNVSLTIYNSSALRDKIDEVKK KAED-- 

T PDG D V+V++ + + DK D K KAED P +G+ 

Sbjct: 446 TYPDGSKDTVDVTVKVVDPRTDADKNDPAGKTXXJVNVGETPKAEDSIGN^ 505 

Query: 202 VLISLDDIKTDIDNNPK TQSDIANKITEVTNLEKILVPRIPDADKNDPAGKDQQVNV 258 

+D T D K T D + +VT K++ PR DADKNDPAGKDQQVNV 
Sbjct: 506 FETPVDTA - TPGDKPAKVWTYPDGSKDTVDVT - - VKWDPRT - DADKNDPAGKDQQVNV 561 
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Sbjct: 920 TYPDGSKDTVDVTVKVVrjPRTDADKro 979 



Query: 202 VLISLDDIKTDIDHNPK TQSDIANKITEVTNLEKILVPRIPDRDKNDPAGKDQQVHV 258 

+D T D K T D + +VT K++ PR DADKM)PAGKDQQVNV 
5 Sbjct: 980 FETPVDTA-TPGDKPAKVWTYPDGSKDTVDVT- -VKWDPRT- DADKNDPAGKDQQVNV 1035 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 241 

A DNA sequence (GBSx0255) was identified in S.agalactiae <SEQ ID 765> which encodes the amino acid 
sequence <SEQ ID 766>. This protein is predicted to be ara-C-like activator. Analysis of this protein 
sequence reveals the following: 

Poseible site: 30 



Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9589> which encodes amino acid sequence <SEQ ID 9590> 

was also identified. 

There is homology to SEQ ID 460. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 242 

A DNA sequence (GBSx0256) was identified in S.agalactiae <SEQ ID 767> which encodes the amino acid 
sequence <SEQ ID 768>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1200 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9587> which encodes amino acid sequence <SEQ ID 9588> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 769> which encodes the amino acid 
sequence <SEQ ID 770>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



- Final Results 

bacterial cytoplasm Certainty=0. 0679 (Affirmative) 

bacterial membrane Certainty^O. 0000 (Not Clear) < ; 

bacterial outside Certainty=0 .0000 (Not Clear) < ; 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 135/175 (76%) , Positives = 161/176 (90%) 

Query: 1 MSYMVKDRQIQKTKVAIYNAFISLLQENDYSKZTVQDVIGLANVGRSTFYSHYESKEVLL 60 

+S M KDRQI+KTK AI Y+AFI +LLQ+ +YSKITV+D+I LANVGRSTFY+HYESKE+LL 
Sbjct: 1 VSDMTKDRQIKKTKTAIYSAFIALLQKKEYSKITVRDMITLAWGRSTFYAHYESKEMLL 60 

Query: 61 KELCEDLFHHLFKQGRDVTFEEYLVHILKHFEQNQDSIATLLLSDDPYFLLRFRSELEHD 12 0 

KELCE+LFHHLF+Q R+VTFE+YLWILKHFEQN+DSIATLLLS+DPYFLLRF++ELEHD 
Sbjct: 61 KELCEELFHHLFRQKRNVTFEDYLVHILKHFEQNKDSIATLLLSNDPYFLLRFKNELEHD 120 

Query: 121 VYPRLREEYI TKVDI PEDFLKQFLLS S F I ETLKWKLHQRQKMTVEDLLKYYLTMVE 176 

VYP LR +YI K 1PE FLKQF+LSSFI3TLKWWLHQRQ+M+ +LLKYYL ++ + 
Sbjct: 121 VYPNLRCKYIDKTTIPEVFLKQFVLSSFIETLKWWLHQRQRMSANELLKYYLELIK 176 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 243 

A DNA sequence (GBSx0257) was identified in S.agalactiae <SEQ ID 771 > which encodes the amino acid 
sequence <SEQ ID 772>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3573 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 244 

A DNA sequence (GBSx0258) was identified in S.agalactiae <SEQ ID 773> which encodes the amino acid 
sequence <SEQ ID 774>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.19 Transmembrane 112 - 128 ( 107 - 131) 

INTEGRAL Likelihood = -8.07 Transmembrane 77 - 93 ( 71 - 97) 

INTEGRAL Likelihood = -6.10 Transmembrane 144 - 160 ( 138 - 165) 

INTEGRAL Likelihood = -3.03 Transmembrane 165 - 181 ( 164 - 182) 

Final Results 

bacterial membrane Certainty=0 . 5076 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 775> which encodes the amino acid 
sequence <SEQ ID 776>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
»> Seems to have an uncleavable N-t 
INTEGRAL Likelihood = -9.13 
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INTEGRAL Likelihood = -5.89 Transmembrane 144 - 160 ( 138 - 153) 

INTEGRAL Likelihood = -5.47 Transmembrane 7 - 23 ( 6 - 29) 

INTEGRAL Likelihood = -3.50 Transmembrane 77 - 93 ( 74 - 94) 

INTEGRAL Likelihood = -2.07 Transmembrane 166 - 182 ( 165 - 183) 



Final Results 

bacterial membrane Certainty=0 .4652 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 212/287 (73%) , Positives = 245/287 (84%) 

MTSNKKVAIAFILNISFSVLEFIFGSLFFSGAILADAVHDFGDAIAIGISATLEKKSKKD 60 
M ++KKV I FILN+SFS++EFIFG+LFFSGAILADAVHDFGDAIAIGISA LE+K+ K 
MPASKCVTIIFILNLSFSLIEFIFGTLFFSGAILADAVHDFGDAIAIGISAILERKAVKK 60 

EDTI FSLGYKRFSLLGALITSLILISGS ILVMIENIPKLWHPTPVNYHGMFILAVIAI II 120 
E FSLGYKRFSLLGAL T+LILISGS+LVMIE IPKLWHPT VNY GMF+LA+ All I 



NG ASFI+HS Q+K+EEILSLHFLEDILGWLAII++SLIL WKP YILDPLLS+AI++FI 



LSKALPKL++T +FLDGVPDSIDY LH EL L + S+NQLN+WSMDGID+RA IHC 



-EK CK++IR ICQ Y IN VTVEID SL EHQ+HC L + 
:EKHCKKS IRLICQRYNINSVTVEIDTSLNEHQHHCSSLSS 287 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 245 

A DNA sequence (GBSx0259) was identified in S.agalactiae <SEQ ID 777> which encodes the amino acid 
sequence <SEQ ID 778>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
40 >» Seems to have no N- terminal signal sequence 

Likelihood = -1.22 Transmembrane 221 - 237 ( 221 - 237) 





1 


Sb j Ct : 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 




241 


Sbjct: 


241 



Final Results 

bacterial membrane Certainty=0 . 14B9 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 780. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
50 vaccines or diagnostics. 

Example 246 

A DNA sequence (GBSx0260) was identified in S.agalactiae <SEQ ID 781> which encodes the amino ac 
sequence <SEQ ID 782>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
55 ?» Seems to have no N-terminal signal sequence 

Likelihood = -2.50 Transmembrane 2 - 18 ( 1 - 18) 
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Final Results 

bacterial membrane Certainty=0. 1999 (Affirmative) < suoo 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 247 

A DNA sequence (GBSx0261) was identified in S.agalactiae <SEQ ID 783> which encodes the amino acid 
sequence <SEQ ID 784>. This protein is predicted to be dehydrogenase (Zn-dependent). Analysis of this 
protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.77 Transmembrane 171 - 187 ( 170 - 187) 

Final Results 

bacterial membrane Certainty=0 . 2508 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:7AAG20655 GB:AE005134 alcohol dehydrogenase; Adh2 [Halobacterium 
sp. NRC-1] 

! = 16S/348 (48%), Positives = 232/348 (66%), Gaps = 9/348 (2%) 

MKVATFIEPGKMVITDTPKPVIEQETDAVIKIVRACVCGSDLWWYRGISKRESGSFAGHE 60 
M+ A + PG++ + + PKP IE DAVI++ VCGSDLW+YRG S RE+GS GHE 
MRAAVYQGPGEIAVEEVPKPDIESPEDAVIRVTHTAVCGSDLWFYRGDSDREAGSRVGHE 6 0 

AIGIVEEVGTKVTDV 
+GIVEEVG VT \i 



AVGLCGV+AA+ LGA RIIAM H+DR ELA FGATD + RGD+A++R DLT+ GA 



• V+ECVG ++D+A IARPG +G VG+P ++ ++ +F NI +RGG+A V 



H-+ D VL ++P +FTK+ LD + + Y AMD R+AIK LV 
5ELMAD-VLQGTLDPSPIFTKTVDLDGVPEGYAAMDDREAIKVLV 345 

There is also homology to SEQ ID 786. 

A related sequence was also identified in GAS <SEQ ID 9145> which encodes the amino acid sequence 
<SEQ ID 9146>. Analysis of this protein sequence reveals the following: 

Possible site: 23 



Ident: 






1 


Sbjct: 


1 




61 


Sbjct: 


61 






Sbjct: 


121 




178 


Sbjct: 


180 




238 


Sb j Ct : 


239 




295 


Sb j ct : 


299 
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Final Results 

bacterial membrane Certainty=0 .3166 (Affix 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 121/353 (34%) , Positives = 182/353 (51%) , Gaps = 16/353 (4%) 

Query: 1 MKVATFIEPGKMVITDTPKPVIKQETDAVIKIVRACVCGSDLWWYRG-ISKRESGSFAGH 59 

MK AT+4- G + + D PKPVI + TDA++++V+ +CG+DL G + + G+ GH 
Sbjct: 15 MKAATYLSTGNLQLIDKPKPVIIKPTDAIVQLVKTTICGTDLHILGGDVPACKEGTILGH 74 

Query: 60 EAIGIVEEVGTKVTDVSKGDFVIVPFTHGCGQCPSCKAGFDGNCTNHQAAKN VGYQG 116 

E IGIV+EVG VT+ GD VI+ C C CK G +C + G Q 

Sbjct: 75 EGIGIVKEVGDAVTNFKIGDKVIISCVTSCHTCYYCKRGLSSHCQDGGWILGHLINGTQA 134 

Query: 117 QYLRYTNANWALVKlPGQPSDYDNETI^SLLTLSDVmTGYH-AAATAEVKEGDTVVVMG 175 

+Y+ +A+ +L P D +L+ LSD++ T Y + VK GD V ++G 

Sbjct: 135 EYVHI PHADGSLYHAPDTIDD EALVMLSDILPTSYEIGVLPSHVKPGDNVCIVG 188 

Query: 176 DGAVGLCGVIAAKMLGANRIIAMSRHKDRQELALTFGATDIVEERGDEAVKRVL-DLTNQ 234 

G VGL ++ + II + ++R E A TFGAT + E VK ++ D+TN 

Sbjct: 189 AGPVGIAALLTVQFFSPANIIMVDLSQNRLFJiAKTFGATHTICSGSSEEVKAIIDDITNG 248 

Query: 235 AGADAVLECVGTEQSVDTATQIARPGAVIGRVGI PQNP - DMNTNNLFWKNIGLRGGIASV 293 

G D +ECVG + D +1 G I VG+ P D N + L+ KNI L G+ + 
Sbjct: 249 RGVDISMECVGYPATFDICQKIISVGGHIANVGVHGKPTOF^DELWIKNITLNTGLVNA 308 

Query: 294 TTFDKSVLLDAVLTHKINPGLVFTKSFVLDDIQKAYEAMDKRDAIKSL-VIVD 345 

T + +LL+ + T KI+ + T F L +++KAYE A +L VI+D 

Sbjct: 309 ISTTTE- -MLLNVLKTGKIDATRLITHHFKLSEVEKAYETFKHAGANNALKVI ID 359 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 248 

A DNA sequence (GBSx0262) was identified in S.agalactiae <SEQ ID 787> which encodes the amino acid 
sequence <SEQ ID 788>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 



45 Final Results 

bacterial cytoplasm Certainty=0 .2169 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD36075 GB:AE001762 hypothetical protein [Thermotoga maritima] 
Identities = 55/128 (42%) , Positives = 72/128 (55%) , Gaps = 8/128 (6%) 

Query: 8 IFPKGEKNPYGEFFIGQSYLAALAKSPDG — NVSVGNVTFEAGCRNNWHVHIiDGYQILLV 65 
55 IF +G K +FF G ++ L +G N V +V FE G R +WH H G QIL+V 

Sbjct: 5 IFERGSKGS-SDFFTGNVWVKMLVTDENGVFNTQVYDWFEPGARTHWHSHPGG-QILIV 62 

Query: 66 TEGSGWYQEEGKEAVSLKPGDVIVTDKGVRHWHGAKKDSEFAHIAITA GKSEFYEA 121 

T G G+YQE GK A LK GDV+ V HWHGA D E HI 1+ G +E+ + 

60 Sbjct: 63 TRGKGFYQERGKPARILKKGDVVEIPPNVVHWHGAAPDEELVHIGISTQVHLGPAEWLGS 122 
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Query: 122 VSDEEYSR 129 

V++EEY + 
Sbjct: 123 VTEEEYRK 130 



5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 249 

A DNA sequence (GBSx0263) was identified in S.agalactiae <SEQ ID 789> which encodes the amino acid 
10 sequence <SEQ ID 790>. This protein is predicted to be gamma-carboxymuconolactone decarboxylase. 
Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



15 Final Results 

bacterial cytoplasm Certainty=0. 4 089 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA20070 GB:AL031155 3-oxoadipate enol-lactone 

hydrolase/ 4 - carboxymuconolactone decarboxylase 
[Streptomyces coelicolor A3 (2) ] 
Identities = 33/93 (35%) , Positives = 59/93 (62%) , Gaps = 1/93 (1%) 

25 

Query: 11 QLEEFAPEFARYNDDILFGEVWAKEDHLTDKTRSIITISALISGGNLEQLEHHLQFAKQN 70 

Q +EF+ +F + +GE+W + L ++RS +T++AL++GG+L++L HL+ A +N 

Sbjct: 349 QADEFSGDFQEFLTRYAWGEITORPG-I£)RRSRSCVTLTALVAGGHLDELAPHLRAALRN 407 

30 Query: 71 GVTKEEIADI 1THIAFYVGWPKAWSAFNKRKEI 103 

G+T EI +++ A Y G P A AF A+++ 
Sbjct: 408 GLTPGEIKEVLLQAAVYCGVPAAKGAFRVAQQV 440 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 250 

A DNA sequence (GBSx0265) was identified in S.agalactiae <SEQ ID 791> which encodes the amino acid 
sequence <SEQ ID 792>. Analysis of this protein sequence reveals the following: 

40 Possible site: 44 

Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5529 (Affirmative) < suco 

45 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 251 

A DNA sequence (GBSx0266) was identified in S.agalactiae <SEQ ID 793> which encodes the amino acid 
5 sequence <SEQ ID 794>. This protein is predicted to be probable transcriptional regulator. Analysis of this 
protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have an uncleavable N-term signal seq 

10 Final Results 

hacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9585> which encodes amino acid sequence <SEQ ID 9586> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG08263 GB:AE004901 probable transcriptional regulator 
[Pseudomonas aeruginosa] 
20 Identities = 36/148 (24%) , Positives = 68/148 (45%) , Gaps = 22/148 (14%) 

Query: 5 QIVEKPAMILAG VTLEKP7KSNQEGIQQAIGICKTQPDFRFD 45 

+IVE+PA + G +E++++ + GIC QP+ F 

Sbjct: 123 RIVERPAFSWGMEYFGSAPGDTIGQLWERFIPREHEIAGKHDPKVSYGICAQQPNGEFH 182 

25 

Query: 46 YSATyQVETSVQAPKGLEIIRIPSATYAVISVKGPMPSSLQETWRKIIQGFFQENNLKPA 105 

Y A ++V+ P+G+ ++P+ YAV + KG P + E+++ I E L+P 

Sbjct: 183 YVAGFEVQEGWPVPEGMVRFQVPAQKYAVFTKKGTAP-QIAESFQAIYSHLLAERGLEPK 241 

30 Query: 106 NSPNLEIYSSQH- -PQDTDYQMEIWLAI 131 

+ E Y + P D + Q+++++ I 
Sbjct: 242 AGVDFEYYDQRFRGPLDPNSQVDLYIPI 269 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 252 

A DNA sequence (GBSx0267) was identified in S.agalactiae <SEQ ID 795> which encodes the amino acid 
sequence <SEQ ID 796>. Analysis of this protein sequence reveals the following: 

40 Possible site: 24 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0887 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB84919 GB:AE000825 conserved protein [Methanothermobacter 
50 thermoautotrophicus] 

Identities = 42/130 (32%), Positives = 71/130 (54%), Gaps = 3/130 (2%) 
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Query: 1 MITQEMKEIINSQLAMVATVDAKGQPNIGPI<RSMRLl'JDDKTFIYNENTDGQTRINIEDNG 60 

M+T EM + I +L VAT D +G PN+ P R D++T + +N +T N+ +N 
Sbjct: 1 MMTPEMMDAIEKELVFVATADEEGTPNWPIGPARPLDERTILIADKYMKKTIRNLHENP 60 

Query: 61 KIEIAFVDRERLLGYRFVGTAEIQTEGTYYEAAKKIVAEGRMG--VPKAVGIIHVERIFNL 118 

+1 + R Y+F GT EI G Y++ +WA+ M PK+ ++ VE I+++ 

Sbjct: 61 RIAL-3PQNARECPYQFKGTVEIFKSGKYFDMVVEWAQNVMTELEPKSAILMTVEEIYSV 119 



Query: 119 
Sbjct: 120 

A related DNA sequence was identified in S.pyogenes <SEQ ID 797> which encodes the amino acid 
sequence <SEQ ID 798>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0789 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 123/128 (96%), Positives = 127/128 (99%) 

Query: 1 MITQEMKEIINSQLAMVATVDAKGQPNIGPKRSMRLVTODKTFIYNENTDGQTRINIEDNG 60 

MITQEMK++IN^■QIA^WATvI)AKGQPNIGPKRSMRLWDDKTFIYNENTDGQTRINIEDNG 
Sbjct: 1 MITQEMKDLINNQLMWATVDAKGQPNIGPKRSMRMvrDDKTFIYNENTDGQTRINIEDNG 60 

Query: 61 KIEIAFVDRERLLGYRFVGTAEIQTEGTYYEAAKKMAEGRMGVPKAVGIIHVERIFNLQS 120 

KIE I AFVDRERLLGYRFVGTAE IQTEG YYEAAKKWA+GRMGVPKAVGI I HVERI FNLQS 
Sbjct: 61 KIEIAFVDRERLLGYRFVGTAEIQTEGAYYEAAKKWAQGRMGVPKAVGIIHVERIFNLQS 120 



Query: 121 G 

G 

Sbjct: 121 GANAGKEI 128 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 253 

A DNA sequence (GBSx0268) was identified in S.agalactiae <SEQ ID 799> which encodes the amino acid 
sequence <SEQ ID 800>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.47 Transmembrane 1028 -1044 (1027 -1048) 

Final Results 

bacterial membrane Certainty=0. 3 18 7 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

!GB:AF054892 surface antigen BspA [Bacteroides forsy. . . 
!GB:AF054892 surface antigen BspA [Bacteroides forsy... 
!GB:AF054892 surface antigen BspA [Bacteroides forsy... 
!GB:AF054892 surface antigen BspA [Bacteroides forsy... 
!GB:AF054892 surface antigen BspA [Bacteroides forsy... 

>GP:AAC82625 GB:AF054892 surface antigen BspA [Bacteroides 
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= 243/566 (42%) , Gaps = 52/566 (9%) 

Query: 95 VPKAKPEVTQEASNSSNDASKVEVPKQDTASKKETLETSTWERKDFVTRGDTnVG F 150 

+P+++A + ++PTA + LT + T +G F 

Sbjct: 120 IPNSVTTIGEWAFKGCSGLKSITLPNSLTAIGQSALSGCTGLTSITIPNSVTTIGEWAFF 179 

Query: 151 SKSGINKLSQTSHLVLPSHAR--DGTQLTQVASFAFTPDKKTAIAEYTSRLGENGKPSRL 208 

SG+ ++ + L +A LT + PD TIE + G +G S 

Sbjct: 180 GCSGLTSITFPNSLTAIGESAFYGCGALTSIT LPDALTTIGESAFK-GCSGLKS1T 234 

Query: 209 DIDQKEIIDEGEIFNAYQLTKLTIPNGYKSIGQDAFVDNKNIAEVNLPESLETISDYAFA 268 

+ IE ++ LT +T+P+ +IG+ AF + + P SL TI + AF 

Sbjct: 235 FPNSLTTIGESAFYDCGALTSITLPDALTTIGRSAFYGCSGLKSITFPNSLTTIGESAFY 294 

Query: 269 HM-SLKQVKLPDNLKVIGELAFFDNQIGGKLYLPRHLIKIAERAFKSNRIQTVEFLGSKL 327 

+ SL + +P+++ IG AF+ + LP li + ERAF + + T +'+ + 

Sbjct: 295 NCGSLTSITIPNSVTTIGRSAFYGCSGLKSITLPDGLTTIEERAFYNCGVLTSITIPNSV 354 

Query: 328 KVIGEASFQD-M^RNVMLPDGLEKIESEAFTGNPGDEHYNNQVVLRTRTGQNPHQLATE 386 

IGE++F + L+++ LPDGL IE AF ML + TN E 

Sbjct: 355 ATIGESAFYGCSGLKSITLPDGLTTIEWGAFY NCGALTS ITI PKtSVSTIGE 405 

Query: 387 NTYVNPDKSLWRATPDMDYTKWLEEDFTYQKNSVTGFS NKGLQKVRRNKNLEIPKQH 443 

+ + +L T D ++ D +++ +++G G + V K ++ K+ 

Sbjct: 406 SAFYGCG-ALKDVTVATOTPIDIQRD-VFRELTLSGIRLHVPAGKKTVYEAK~-DVWKEF 461 



- - PTPDTPKPMPNFATPNDQLW 507 

Query: 504 EIKEGAFMNNRIGTLDLKDKLIKIGDAAFH-INHIYAIVLPESVQEIGRSAFRQNGALHL 562 

GAF I + + D + +GD AF + + +1 LP+SV IG+SAF L 
Sbjct: 508 GAFQKE-IQKITIGDGVTSVGDFAFSGCDALKSITLPKSVTTIGQSAFSGCWDLRS 562 

Query: 563 MFIGNKVKTIGEMAFLSNKLESVNLSEQKQLKTIEVQAFS-DNALSEVVLPPNLQTIREE 621 

+ + + V TIGE AF + LE +++ K + I + F +L+ + LP L I ++ 
Sbjct: 563 LTLPDGVNTIGEKAFY-DCLELTSITIPKSVTAIGQETFHYCVSLTSLTLPDALTAIGKK 621 

Query: 622 AF-KRNHLKEVKGSSTLSQITFNAFD 646 

AF N L V +++ I NAFD 
Sbjct: 622 AFYSCNALTSVTFPKSITTIGENAFD 647 
Identities = 109/407 (26%), Positives = 175/407 (42%), Gaps = 48/407 (11%) 

Query: 222 FNAYQLTKLTIPNGYKSIGQDAFVDNKNIAEVNI.PESLETISDYAFAHMS-LKQVKLPDN 280 

F+ LT +T+PN +IG AF + + +P S+ TI ++AF S LK + LP++ 

Sbjct: 87 FSDCALTSvTLPNSLTAIGDHAFKGCSGLTSITIPWSVTTIGEWAFKGCSGLKSITLPNS 145 

Query: 281 LKVIGEIAFFDNQIGGKLYLPRHLIKLAERAFKSNRIQTVEFLGSKLKVIGEASFQD-NN 339 
L IG+ A + +P + + E AF T + L IGE++F 

' Sbjct: 147 LTAIGQSALSGCTGLTSITIPNSVTTIGEWAFFGCSGLTSITFPNSLTAIGESAFYGCGA 205 

Query: 340 LRNVMLPDGLEKIESFAFTGNPGDEHYIWQWLRTRTGQNPHQLATENTYVNPDKSLTOA 399 

L ++ LPD h I AF G G L++ T N E+ + + 

Sbjct: 207 LTSITLPDALTTIGESAFKGCSG LKSITFPNSLTTIGESAFYDCGALTSIT 257 

Query: 400 TPDmYTKVOJEEDFTYQKNSVTGFSNKGLQKVRRNKNLEIPKQHNGITITEIGDNAFRNV 459 

PD +4T K++ P ++T IG++AF N 

Sbjct: 258 LPD ALTTIGRSAFYGCSGLKSITFPN SLTTIGESAFYNC 296 

Query: 460 DFQSKTLRKYDLEEIKLPSTIRKIGAFAFQS-NNLKSFEASEDLEEIKEGAFMNNRIGT- 517 

L I +P+++ IG AF + LKS + L I+E AF N + T 
Sbjct: 297 G SLTSITIPNSVTTIGRSAFYGCSGLKSITLPDGLTTIEERAFYNCGVLTS 347 

Query: 518 LDLKDKLIKIGDAAFH-INHIYAIVLPESVQEIGRSAFRQNGALHLMFIGNKVKTIGEMA 576 

+ + + + IG++AF+ + + +1 LP+ + I AF GAL + I N V TIGE A 
Sbjct: 348 ITIPNSVATIGESAFYGCSGLKSITLPD3LTTIEWGAFYNCGALTSITIPNSVSTIGESA 407 
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Query: 577 FLS-NKLESVNLSEQKQLKTIEVQAFSDNALSEWL- -PPNLQTIRE 620 

F L+ V ++ +1+ F + LS + L P +T+ E 

Sbjct: 408 FYGCGALKDVTVAWDTPI -DIQRDVFRELTLSGIRLHVPAGKKTVYE 453 
Identities = 111/465 (23%), Positives = 185/465 (38%), Gaps = 56/465 (12%) 

Query: 141 VTRGDTLVGFSKSGINKLSQTSHLVLPSHAADGTQLTQV&SFAF TPDKKT 190 

+T D L +S S + P+ LT + AF PD T 

Sbjct: 210 ITLPDALTTIGESAFKGCSGLKSITFPN -SLTTIGESAFYDCGALTSITLPDALT 263 

Query: 191 AIAEYTSRLGENGKPSRLDIDQKEIIDEGEIFNAYQLTKLTIPNGYKSIGQDAFVDNKNI 250 

I ++ G +G S + I E +N LT +TIPN +IG+ AF + 
Sbjct: 264 TIGR-SAFYGCSGLKSITFPNSLTTIGESAFYNCGSLTSITIPNSVTTIGRSAFYGCSGL 322 

Query: 251 AEVOTjPESLETISDYAFAHMS-LKQVKLPDI^KVIGEIiAFFDNQIGGKLYLPRHLIKLAE 309 

+ LP+ L TI + AF + L + +P+++ IGE AF+ + LP L + 

Sbjct: 323 KSITLPDGLTTIEERAFYNCGlTjTSITIPNSVATIGESAFYGCSGLKSITLPDGLTTIEW 382 

Query: 310 RAFKSNRIQTVEFLGSKLKVIGEASFQD-NNLRNVMLP-DGLEKIESEAF TGNPG 362 

AF + T + + + IGE++F L++V + D 1+ + F +G 

Sbjct: 383 GAFYNCGALTSITIPNSVSTIGESAFYGCGALKDVT\?AWDTPIDIQRDVFRELTLSGIRL 442 

Query: 363 DEHYNNQWLRTRTGQNPHQLATEN TYVNPDKSLWRATPDMDYTKWLEEDFTY 415 

+ V + + ++ Y K+L P D K + +F 

Sbjct: 443 HVPAGKKTVYEAKDVWKEFNIVEDDDFGGLQKNYDAATKTLTITNPTPDTPKPM-PNFAT 501 

Query: 416 QKNSVTGFSNKGLQKVPJ^KNLEIPKQHNGITITEIGDNAFRNVDFQSKTLRKYDLEEIK 475 

+ + G K G +T +GD AF D L+ I 

Sbjct: 502 PNDQLWGAFQKEIQKIT IGDGVTSVGDFAFSGCD ALKSIT 541 

Query: 476 LPSTIRKIGAFAFQSN-NLKSFEASEDLEEIKEGAFMN-NRIGTLDLKDKLIKIGDAAFH 533 

LP ++ IG AF +L+S + + IE AF + + ++ + + IG FH 
Sbjct: 542 LPKSVTTIGQSAFSGCWDLRSLTLPDGVNTIGEKAFYDCLELTSITIPKSVTAIGQETFH 601 

Query: 534 -INHIYAIVLPESVQEIGRSAFRQNGALHLMFIGNKVKTIQEMAF 577 

+ ++ LP+++- IG+ AF AL + + TIGE AF 

Sbjct: 602 YCVSLTSLTLPDALTAIGKKAFYSCNALTSVTFPKSITTIGENAF 646 
Identities = 98/351 (27%) , Positives = 152/351 (42%) , Gaps = 53/351 (15%) 



Sbjct: 68 SKIQTVT-IGDGVTSVGNNAFSDCALTSVTLPNSLTAIGDHAFKGCSG LTS 117 

Query: 375 RTGQNPHQIATEOTYVNPDKSLWRATPDMDYTraiLEEDFTYQKNSVTGFSNKGLQKVRRN 434 

T P+ + T + S ++ NS+T L 

Sbjct: 118 IT- - IPNSVTTIGEWAFKGCSGLKS IT LPNSLTAIGQSALSGCTGL 161 

Query: 435 KNLEIPKQHNGITITEIGDNAF RNVDFQSKTLRKYD LEEIKLPSTI 480 

++ IP ++T IG+ AF ++ F + + L I LP + 

Sbjct: 162 TSITIPN SVTTIGEWAFFGCSGLTSITFPNSLTAIGESAFYGCGALTSITLPDAL 216 

Query: 481 RKIGAFAFQS-HNLKSFEASEDLEEIKEGAFMN-NRIGTLDLKDKLIKIGDAAFH-INHI 537 

IG AF+ + LKS L IE AF + + ++ L D L IG +AF+ + + 

Sbjct: 217 TTIGESAFKGCSGLKSITFPNSLTTIGESAFYDCGALTSITLPDALTTIGRSAFYGCSGL 276 

Query: 538 YAIVLPESVQEIGRSAFRQNGALHLMFIGNKVKTIGEMAFLS-NKLESVNLSEQKQLKTI 596 

+1 P S+ IG SAF G+L + I N V TIG AF + L+S+ L + L TI 
Sbjct: 277 KBITFPNSLTTIGESAFYNCGSLTSITIFNSVTTIGRSAFYGCSGLKSITLPD--GLTTI 334 

Query: 597 EVQAFSD-NALSEWLPPNLQTIREEAFKR-NHLKEVKGSSTLSQITFNAF 645 

E +AF + L+ + +P +4 TI E AF + LK + L+ I + AF 

Sbjct: 335 EERAFYNCGVLTSITIPNSVATIGESAFYGCSGLKSITLPDGLTTIEWGAF 385 
Identities = 78/282 (27%) , Positives = 123/282 (42%) , Gaps = 46/282 (16%) 

Query: 111 NDASICVEVPKQDTASKKETLETSTWEAKDFVTRGDTLVGFSKSGINKLSQTSHLVLPS-- 168 

N+AS E+P SK +T vT GD + + + + TS + LP+ 

Sbjct: 56 NNAS--EIPWHSLQSKIQT VTIGDGVTSVGNNAFSDCALTS- VTLPNSL 101 

Query: 169 HAADG TQLTQVASFAFT PDKKTAIAEYTSRLGENG 203 
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HA G +T + +AF P+ TAI + ++ G G 

Sbjct: 102 TAIGDHAFKGCSGLTSITIPNSOTTIGEWAFKGCSGLKSITLPNSLTAIGQ-SALSGCTG 160 

Query: 204 KPSRLDIDQKEIIDEGEIFNAYQLTKLTIPNGYKSIGQDAFVDNKNIAEVNLPESLETIS 2S3 
5 S + I E F LT +T PN +IG+ AF + + LP++L TI 

Sbjct: 161 LTSITIPNSVTTIGEWAFFGCSGLTSITFPNSLTAIGESAFYGCGALTSITLPDALTTIG 220 

Query: 264 DYAFAHMS-LKQVI<LPDNLKVIGEIAFFDNQ1GGKXYLPRHLIKLAERAFKS-NRIQTVE 321 
+ AF S LK + P++L IGE AF+D + LP L + AF + ++++ 

10 Sbjct: 221 ESAFKGCSGLKSITFPNSLTTIGESAFYDCGALTSITLPDALTTIGRSAFYGCSGLKSIT 280 

Query: 322 FLGSKLKVIGEASFQD-NNLRNVMLPDGLEKIESEAFTGNPG 362 

F S L IGE++F + +L ++ +P+ +1 AF G G 
Sbjct: 281 FPNS-LTTIGESAFYNCGSLTSITIPNSVTTIGRSAFYGCSG 321 
15 Identities = 43/144 (29%), Positives = 70/144 (47%), Gaps = 4/144 (2%) 



Query: 220 EIFNAYQ- -LTKLTIPNGYKSIGQDAFVDNKNIAEVNLPESLETISDYAFAHM-SLKQVK 276 

+++ A+Q + K+TI +G S+G AF + + LP+S+ TI AF+ L+ + 

Sbjct: 505 QLWGAFQKEIQKITIGDGVTSVGDFAFSGCDALKSITLPKSVTTIGQSAFSGCWDLRSLT 564 

Query: 277 LPDNLKVIGEIAFFDNQIGGKLYLPRHLIKLAERAFKSNRIQTVEFLGSKLKVIGEASFQ 336 

LPD + IGE AF+D + +P+ + + + F T L L IG4 4F 

Sbjct: 565 LPDGVNTIGEKAFYDCLELTSITIPKSVTAIGQETFHYCVSLTSLTLPDALTAIGKKAFY 624 

Query: 337 D-NNLRNVMLPDGLEKIESEAFTG 359 

N L +V P 4 I AF G 
Sbjct: 625 SCNALTSVTFPKSITTIGENAFDG 648 
Identities = 43/134 (32%), Positives = 66/134 (49%), Gaps = 12/134 (8%) 

Query: 511 MNNRIGTLDLKDKLIKIGDAAFHINHIYAIVLPESVQEIGRSAFRQNGALHLMFIGNKVK 570 

+ 4+1 T+ + D + +G4 AF 4 44 LP S4 IG AF4 L 4 I N V 
Sbjct: 66 LQSKIQTVTIGDGOTSVGNNAFSDCALTSVTLPNSLTAIGDHAFKGCSGLTSITIPKSVT 125 

Query: 571 TIGEMAFLS-NKLESVNLSEQKQLKTIEVQAFSD-NALSEWLPPNLQTIREEAFKRNHL 628 

TIGE AF 4 L4S4 L LI AS L4 4 4P 44 TI E AF 
Sbjct: 126 TIGEWAFKGCSGLKSITL--PNSLTAIGQSALSGCTGLTSITIPNSVTTIGEWAF 178 

Query: 629 KEVKGSSTLSQITF 642 

G S L4 ITF 
Sbjct: 179 FGCSGLTSITF 189 

A related DNA sequence was identified in S. pyogenes <SEQ ID 80 1> which encodes the amino acid 
sequence <SEQ ID 802>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.44 Transmembrane 984 -1000 ( 984 -1001) 



Final Results 

50 bacterial membrane Certainty=0 .1977 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

LPXTG motif: 975-979 

55 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 751/1050 (71%) , Positives = 861/1050 (81%) , Gaps = 45/1050 (4%) 

Query: 3 KKHLKTLALALTTVSVVTYSQEVYGLEREESVKQEQTQSA- SEDDWFEEDNERKTNVSKE 61 
60 KKHLKT4AL LTTVS WT4 4QEV+ L +E +RQ Q S+ S D4 E 4 K +++ 

Sbjct: 2 KKHLKWALTLTTVSVVTHNQEVFSLVKEPILKQTQASSSISGADYAESSGKSKLKINET 61 



Query: 62 
65 Sbjct: 62 



NSTVDETVSDLFSDGNSNNSSSKTES WSDPKQVPKAKPEVTQEASNSSNDASKVEVPKQ 121 
4 VD+TV+DLFSD 4 K +Q KA E T E4 S44E K4 

SGPVDE3TVTDLFSDKRTTPEKIKDNLAKGPREQELKAVTENT-ESEKQITSGSQLEQSKE 120 
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Query: 122 DTASKKETLETSTWEAKDFVTRGDTLVGFSKSGINKLSQTSHLVLPSHAADGTQLTQVAS 181 

+ K TS WE DF+T+G+TLVG SKSG+ KLSQT HLVLPS AADGTQL QVAS 
Sbjct: 121 SLSLNKTVPSTSNWEICDFITKGNTLVGLSKSGVEKLSQTDHLVLPSQAADGTQLIQVAS 180 

Query: 182 FAFTPDKKTAIAEYTSRLGENGKPSRLDIDQKEIIDEGEIFNAYQLTKLTIPNGYKSIGQ 241 

FAFTPDKKTAIAEYTSR GENG+ S+LD+D KEII+EGE+FN+Y L K+TIP GYK IGQ 
Sbjct: 181 FAFTPDKKTAIAEYTSRAGENGEISQLDVDGKEIINEGEVFNSYLLKKVTIPTGYKHIGQ 240 

Query: 242 DAFVDNKN1AEVNLPESLETISDYAFAHMSLKQVKLPDNLKVIGELAFFDNQIGGKLYLP 301 

DAFVDNKNIAEVNLPESLETISDYAFAH++LKQ+ LPDNLK IGELAFFDNQI GKL LP 
Sbjct: 241 DAFVDNKNIAEVNLPESLETISDYAFAHLALKQIDLEDNLKAIGELAFFDNQITGKLSLP 3 00 

Query: 302 RHLIKIAERAFKSmiQTWFLGSKLKV'IGEASFQDHWLRNVMLPDGLEKIESEAFTGNP 361 

R L++LAERAFKSN I+T+EF G+ LKVTGEASFQDN+L +MLPDGLEKIESEAFTGNP 
Sbjct: 301 RQLMRLAERAFKSNHIKTIEFRGNSLKVIGEASFQDNDLSQLMLPDGLEKIESEAFTGNP 360 

Query: 362 GDEHYIWQVVLRTRTGQNPHQIATEmYVNPDKSLWRATPDMDYTKWLEEDFTYQKNSVT 421 

GD+HYNN+WL T++G+NP IATENTYVNPDKSLW+ +P++DYTKWLEEDFTYQKNSVT 
Sbjct: 361 GDDHYN^VVLIVTKSGKNPSGIATENTYVNPDKSLWQESPEIDYTKWLEEDFTYQKNSVT 420 

Query: 422 GFSNKGLQKVRRNKNLEIPKQHNGITITEIGDKAFRNVDFQSKTLRKYDLEEIKLPSTIR 481 

GFSNKGLQKV+RNKNLE1PKQHNG+TIT3IGDNAFRNVDFQ+KTLRKYDLEE+KLPSTIR 
Sbjct: 421 GFSNKGLQKVKRNKNLEIPKQHNGVTITSIGDNAFRNVDFQNKTLRKYDLEEVKLPSTIR 480 

Query: 482 KIGAFAFQSHNLKSFEASEDLEEIKEGAFMNNRIGTLDLKDtCLIKIGDAAFHINHIYAIV 541 

KIGAFAFQSNNLKSFEAS+DLEEIKEGAFMNNRI TL+LKDKL+ IGDAAFHINHIYAIV 
Sbjct: 481 KIGAFAFQSNNLKSFEASDDLEEIKEGAFMNNRIETLELKDKLVTIGDAAFHINHIYAIV 540 

Query: 542 LPESVQEIGRSAFRQNGALHLMFIGNKVKTIGEMAFLSNKLESVNLSEQKQLKTIEVQAF 601 

LPESVQEIGRSAFRQNGA +L+F+G+KVKT+GEMAFLSN+LE ++LSEQKQL I VOAF 
Sbjct: 541 LPESVQEIGRSAFRQNGANNLIFMGSKVKTLGEMAFLSNRljEHLDI.SEQKQLTEIPVQAF 600 

Query: 602 SDNALSEVVLPPNLQTIREEAFKRimL[03VKGSSTLSQITETNIAFDQNDGDKRFGKKVVVR 6S1 

SDNAL EV+LP +L+TIREEAFK+NHLK+++ +S LS I FNA D NDGD++F KVW+ 
Sbjct: 601 SDNALKEVLLPASLKTIREEAFKKiraLKQLEVASALSHIAFNALDDNDGDEQFDNKVAA/K 660 

Query: 662 THNNSHMLADGERFIIDPDKLSSTMVDLEKVLKIIEGLDYSTLRQTTQTQFREMTTAGKA 721 

TH+NS+ LADGE FI+DPDKLSST+VDLEK+LK+IEGLDYSTLRQTTQTQFR+MTTAGKA 
Sbjct: 661 THHNSYALADGEHFIVDPDKLSSTIVDLEKILKLIEGLDYSTLRQTTQTQFRDMTTAGKA 720 

Query: 722 LLSKSNLRQGEKQKFLQEAQFFLGRVDLDKAIAKAEKALVTKKATKNGHLLERSINKAVL 781 

LLSKSNLRQGEKQKFLQEAQFFLGRVDLDKAIAKAEKALVTKKATKNG LLERSINKAVL 
Sbjct: 721 LLSKSNLRQGEKQKFLQEAQFFLGRVDLDKAIAKAEKALVTKKATKNGQLLERSINKAVL 780 

Query: 782 AYTOTSAIKKANVKRLEKELDLLTDLVEGKG^ B41 

AYNNSAI KKANVKRLEKELDLLT LVEGKGPLAQATMVQGVYLLKTPLPLPEYY1GLNVY 
Sbjct: 781 AYlOTSAIKKANVKRLEKELDLLTGLVEGKGPIAQATIWQGWLLKTPLPLPEYYIGliNVY 840 

Query: 842 FDKSGI^IYAIjDMSDTIGEGQKDAYGNPIIOTDEDNEGYHTLAVATLADYEGLYIKDIIM 901 

FDKSGKLIYALDMSDTIGEGQKDAYGNPILNVDEDNE3YH LAVATTADYEGL IK ILN 
Sbjct: 841 FDKSGKLIYALDMSDTIGEGQKDAYGNPIMIVDEDNEGYHALAVATLADYEGLDIKTILN 900 

Query: 902 SSLDKIKAIRQIPLAKYHRLGIFQAIRNAAAEADRLLPKTPKGYliNEVPNYRKKQVEKNL 961 

S L ++ +IRQ+P A YHR GIFQAI+NAAAEA++LI.PK 
Sbjct: 901 S KLSQLTS IRQVPTAAYHRAGI FQAIQNAAAEAEQLLPK 939 

Query: 962 KPVDYKTPIFNKALPMKVDGDRAAKGHNINAETNNSVAVTPIRSEQQLHKSQSDVNLPQ 1021 

++++ + N++ ++S + ++ + LP+ 

Sbjct: 940 PGTHSEKSSSSESANSKDRG LQSNPKTNRGRHSAILPR 977 

Query: 1022 TSSKNNFIYEILGYVSLCLLFLVTAGKKGK 1051 

T SK +F+Y ILGY S+ LL L+TA KK K 
Sbjct: 978 TGSKGSFVYGILGYTSVALLSLITAIKKKK 1007 

SEQ ID 800 (GBS97) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 12; MW 113.4kDa). 



WO 02/34771 



PCT/GB01/04789 



-318- 

GBS97-His was purified as shown in Figure 193, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 254 

A DNA sequence (GBSx0269) was identified in S.agalactiae <SEQ ID 803> which encodes the amino acid 
sequence <SEQ ID 804>. This protein is predicted to be ribonucleoside-diphosphate reductase alpha chain 
(nrdE). Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 4274 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB96160 GB:AE000050 ribonucleoside-diphosphate reductase alpha 
chain~MPN324 (new) , 513 (Himmelreich et al . , 1996) 
[Mycoplasma pneumoniae] 
Identities = 476/725 (65%) , Positives = 586/725 (80%) , Gaps = 20/725 (2%) 

Query 2 TQSD- -AYLSIiNAKTRFRDRTGNYHFTSDKEAVSQYMIEHVEPNTMVFTSLIEKLDYLVS 59 

TQ D +Y+SLNA T+ F D AVE Y+ EHV+P T VF S E+LD+LV 

Sbjct: 12 TQEDLESYISLNAYTKVYG DFKMDLHAVEAYIQEHVKPKTKVFHSTKERIXIFLVK 66 

Query: 60 NNYYESDLLKQYNLEFICQIFEHAYAKKFAFLNFMGALKFYNAYALKTEDNRYYLEHYED 119 

N+YY+ +++ Y+ E +1 AYA +F + NFMGA KFYNAYALKT D ++YLE+YBD 
Sbjct: 67 NDYYDENIINMYSFEQFEEITRKAYAYRFRYANFMGAFKFYNAYALKTFDGKWYLENVED 126 

Query: 120 RVVMNALFIAAGDEKAAYDLVDDMIANRFQPATPTFLNAGKKRRGEYISCYLLRIEDNME 179 
RWMN LFIA G+ A L+ ++ NRFQPATPTFLNAG+K+RGE++SCYLLRIEDNME 

Sbjct: 127 RV\mNVLFUM^KALKLL^ 186 

Query: 180 SISRMSTSLQLSKRGGGVALCLTNLREFGAPIKGIKNQATGIVPVMKLLEDSFSYANQL 239 
SI RAI+T+LQLSKR GGVAL LTN+RE GAPIK I+NQ++GI+P+MKLLEDSFSYANQL 

Sbjct: 187 SIGRAITTTLQLSKRDGGVALLLTNIRESGAPIKKIENQSSGIIPIMKLLEDSFSYANQL 246 

Query: 240 GQRQGAGAVYLHAHHPEVLTFLDTKRENADEKIRIKSLSLGLVIPDITFELAKANKDMAL 299 

GQRQGAGAVYLHAHHP+V+ FLDTKRENADEKIRIKSLSLGLVIPDITF IAK N++MAL 
Sbjct: 247 GQRQGAGAVYLHAHHPDWQFLDTKRENADEKIRIKSLSLGLVIPDITFTLAKNNEEMAL 306 

Query: 300 FSPYDIERVYGKPMSDISITEEYETLLANADIRKTFISARKLFQTIAELHFESGYPYILF 359 

FSPYD+ YGKP+SDIS+TE Y MAN I+KTFI+ARK FQT+AELHFESGYPYILF 
Sbjct: 307 FSPYDVYEEYGKPLSDISVTEMYYELLANQRIKKTFINARKFFQTVAELHFESGYPYIIiF 366 

Query: 360 EDTVNAKNPHKKEGRIWSNLCSEIAQVNTASQFSEDLTFTKVGHDVCCNLGSINIARAM 419 

+DTVN +N H RIVMSNLCSEI Q +T S+F DL F KVG+D+ CNLGS+NIA+AM 
Sbjct: 367 DDTVNRRNAH- -PNRIWSNLCSEIVQPSTPSEFKHDLAFKKVGNDISCNLGSLNIAKAM 424 

Query 420 DQAADFEKlIANSIRALDRVSRTSDIJ3SAPSIKKGNAANHAVGLGA>mLHGFI J ATNHIYY 479 

+ +F +L+ +1 +LD VSR S+L++APSI+KGN+ NHA+GLGAMNLHGFLATN IYY 
Sbjct: 425 ESGPEFSELvTCIAIESLDLVSRVSNLETAPSIQKGNSENHALGLGAMNLHGFIA'TNQIYY 484 

Query: 480 DSQEAIDFTDCFFYAMAYYAFKASNHLAKEKGTFEGFSESSYADGSYFYQY— TEQNF-E 536 

+S EAIDFT+ FFY +AY+AFKAS+ LA EKG F+ F + +ADGSYF +Y E +F 
Sbjct: 485 NSPEAIDFTNIFFYIVAYlIAFKASSEIJUjEKGKFKMFFJ^KFADGSYFDKYIKVEPDFWT 544 

Query: 537 PKTQRVTCNLIAEYGLTLPSQEDWRKLVQSIKEIGIANAHT.LAVAPTGSISYLSSCTPSLQ 596 

PKT+RVK L +Y + +P++E+W++L +I++ GLAN+HLLA+APTGSISYLSSCTPSLQ 
Sbjct: 545 PKTERVKALFQKYQVEIPTRENWKELAIiNIQKNGIjANSHIiIAIAPTGSISYLSSCTPSLQ 604 
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Query: 597 PWSPVEVRKEGALGRAreVPAYKIDADNYVYYKKGRYEVGSEAIINIAAAAQKHIDQAIS 656 

PWSPVEVRKEG LGR+YVPAY+++ D+Y +YK GAYE+G E IINIAAAAQ+H+DQAIS 
Sbjct: 605 PWSPVEVEKEGRLGRIYVPAYQLNKDSYPFYKDGAYELGPEPIINIAAAAQQHVDQAIS 664 

Query: 657 LTLFMTDQATTRDLNKAYIQAFKQKCASIYYVRVRQDILEGSESYDDMLDDFTSSDLEDC 716 

LTLFMTD+ATTRDLNKAYI AFK+ C+SIYYVRVRQ++LE SE + + +4- C 

Sbjct: 665 LTLFMTDKATTRDLNKAYIYAFKKGCSSIYYVRVRQEVLEDSEDH TIQMQQC 716 

Query: 717 QSCMI 721 
++C+I 

Sbjct: 717 EACVI 721 

A related DNA sequence was identified in S.pyogenes <SEQ ID 805> which encodes the amino acid 
sequence <SEQ ID 806>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1843 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC82625 GB:AF054892 surface antigen BspA [Bacteroides forsythus] 
Identities = 124/451 (27%), Positives = 202/451 (44%), Gaps = S5/451 (14%) 

Query: 221 FNSYLLKKOTIPTGYKHIGCtfffiFVDNKNIAEVm 279 

L VT+P IG AF + + +P S+ TI ++AF + LK I LP++ 



• +1 F NSL IGE++F 





221 


Sb j Ct : 


87 




280 


Sb j ct : 


147 




338 


Sbjct: 


206 
398 


Sbjct: 


249 




458 


Sbjct: 


296 




517 


Sbjct: 


347 




575 


Sbjct: 






632 


Sbjct: 


461 



L+ + LPD LI AF G G 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 534/726 (73%), Positives = 614/726 (84%), Gaps = 5/726 (0%) 
Query: 1 MTQSDA-YLSI^AKTOFRDRTGNYHFTSDKEAVEQYMIEHVEPNTMVFTSIjlEKLDYLVS 59 
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Sbjct: 


1 


Query: 


60 


Sbjct: 


61 


Query: 


120 


Sbjct: 


121 


Query: 


180 


Sbj ct: 


181 


Query: 


240 


Sbjct: 


241 


Query: 


300 


Sbjct: 


301 


Query: 


360 


Sbjct: 


361 


Query: 


420 


Sbjct: 


421 


Query: 


480 


Sbj ct : 


481 


Query: 


540 


Sbjct: 


541 


Query: 


600 


Sbjct: 


601 




660 


Sbj ct: 


661 






Sbjct: 


721 



- FLN MGA+KFY +YALKT D + YLE +ED 



R VMNALFLA GD+ +D++D +L RFQPATPTFLNAGKKRRGEYI SCYLLR+EDNME 



SISRAISTSLQLSKRGGGVALCLTNLRE GAPIKGI+NQATGIVPVMKLLEDSFSYANQL 



FSPYDI+R YGK MSDISITEEY+ LLAN I+KT+ISARK FQ IAELHFESGYPY+LF 



J +NPH K+GRIVMSNLCSEIAQV+T S F EDL+F +G D+ CCNLGS INIA+AM 



FE+LI SIRALDRVSR SDL+ APS++ GNAANHAVGLGAMNLHGFLATNHIYY 



+PVEVRKEG+LGR+YVPAY+ID NY YY++GAYEVG +AII++ AAAQKH+DQAISLTL 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 255 

A DNA sequence (GBSx0270) was identified in S.agalactiae <SEQ ID 807> which encodes the amino acid 
sequence <SEQ ID 808>. This protein is predicted to be nrdl protein (nrdl). Analysis of this protein 
sequence reveals the following: 

Possible site: 54 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2952 (Affirmative) < suco 
bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC71451 GB:U39702 nrdl protein (nrdl) [Mycoplasma genitalium] 
5 Identities = 77/127 (60%) , Positives = 104/127 (81%) , Gaps = 1/127 (0%) 

Query: 7 WYFSSKSNNTHRFVQKLACSNQR1PSD-GSSILVTEDYILIVPTYAGGGDDTKGAVPKQ 65 

+VYFSS SNNTHRF++KL ++RIP D SI V+ +Y+LI PTY+GGG+ +GAVPKQ 
Sbjct" 22 IVYFSSIS™THRFIEKLGFQHKRIPVDITQSITVSNEYVLICPTYSGGGNQVEGfl.VPKQ 81 

10 

Query. 66 WQFLNVRQMffiHCQGVISSGNTNFGDTYAIAGPIIARKIJWPLLHQFELIfiTQEDVTRV 125 

V4QFLN + NRE C+GVI +SGNTNFGDT4- +AG +I++KUWPLL+QFELLGT+ DV + 
Sbjct: 82 VIQFUmKHNRELCKGVIASGOTMFGDTFCIAGTVISmj^PLLYQFELLGTIO^VEQT 141 

15 Query: 126 KELLCQF 132 

Sbjct: 142 QKIIANF 148 

A related DNA sequence was identified in S.pyogenes <SEQ ID 809> which encodes the amino acid 
20 sequence <SEQ ID 810>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm --- Certainty=0 . 0089 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

30 Identities = 84/125 (67%) , Positives = 100/125 (79%) 

Query: 7 WYFSSKSNNTHRFVQKLACSNQRIPSDGSSI1.VTEDYILIVPTYAGGGDDTKGAVPKQV 66 

+VYFSSKSNNTHRFVQKL QRIP D + V+ Y+LIVPTYA GG D KGAV KQV 
Sbjct: 6 IVYFSSKSNNTHRFVQKLGLPAQRIPTONRPLEVSrHYLLIVPTYARGGSDAKGAVSKQV 65 

35 

Query: 67 VQFLNVRQNREHCQGVI SSGNTNFGDTYAIAGPI I ARK1MVPLLHQFELLGTQEDVTRVK 126 

++FLN NR+HC+GVI SSGNTNFGDT+A+AGPI I + + KL VPLLHQFELLGT DV +V+ 
Sbjct: 66 IRFI^PI^KHCKGVISSGNTNFGDTFALAGPIISQKLQVPLLHQFELLGTATDVKiCVQ 125 

40 Query: 127 ELLCQ 131 

Sbjct: 126 AIFAR 130 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 256 

A DNA sequence (GBSx0271) was identified in S.agalactiae <SEQ ID 81 1> which encodes the amino acid 
sequence <SEQ ID 812>. This protein is predicted to be ribonucleoside-diphosphate reductase bete chain 
(nrdF). Analysis of this protein sequence reveals the following: 

50 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0. 3889 (Affirmative) < suco 

55 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAB96162 GB:AE000050 ribonucleoside-diphosphate reductase beta 
chain [Mycoplasma pneumoniae] 
Identities = 261/335 (77%) , Positives = 301/335 (88%) 

Query: 2 QSYYDRSQSPLDYMiSEKAFPMRSVl^KliroDKDLEV^IRVTQNFWLPEKIPVSNDLNS 61 

+ Y+ S SPL+YA + +RSVNWN ++D+KDLEVWNR+TQNFWLPEKIPVSND+ S 
Sbjct: 5 KKY'FLESVSPLEYAQKKFQGmRSVNWNLVDDEKDLEVWNRITQNFWDPEKIPVSNDIPS 64 

Query: 62 WRTLDADWQQLITRTFTGLTLLDSVQATVGDIAQIKHSQTDHEQVIYANFAFMVAIHARS 121 

W+ L +WQ LIT+TFTGLTLLD++QAT+GDI QI ++ TDHEQVIYANFAFMV +HARS 
Sbjct: 65 WKQLSKEWQDLITKTFTGLTLLDTIQATIGDIKQIDYALTDHEQVIYANFAFMVGVHARS 124 

Query: 122 YGTIFSTLCTSQQIEEAHEWVVDTESLQARSRILIPFYTGDDPLKSKVAAAMMPGFLLYG 181 

YGTIFSTLCTS+QI EAHEWW TESLQ R++ LIP+YTG DPLKSKVAAA+MPGFLLYG 
Sbjct: 125 YGTIFSTLCTSEQITEAHEWWKTESLQKRAKALIPYYTGKDPLKSKVAAALMPGFLIiYG 184 

Query: 182 GFYLPFYLSARGKLPNTSDIIRLILRDKVIHNYYSGYKYQQKVAKLSVEKQAEMKTFVFD 241 

GFYLPFYLS+R +LPNTSDIIRLILRDKVIHNYYSGYK+Q+KV K+S EKQAEMK FVFD 
Sbjct: 185 GFYLPFYLSSRKQLPNTSDIIRLILRDKVIHNYYSGYKFQRKVEKMSKEKQAEMKRFVFD 244 



Sbjct: : 

Query: 302 QLSARADENHDFFSGNGSSYIMGITEETLDEDWEF 336 

QLSARADENHDFFSGNGSSY+MGI+EET D+DW+F 
Sbjct: 305 QLSARADENHDFFSGNGSSYVMGISEETEDKDWDF 339 

A related DNA sequence was identified in S.pyogenes <SEQ ID 813> which encodes the amino acid 
sequence <SEQ ID 814>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 3779 (Affirmative 

bacterial membrane — - Certainty^O. 0000 (Not Clear) . 

bacterial outside — - Certainty=0. 0000 (Not Clear) ■ 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 292/335 (87%) , Positives = 318/335 (94%) 
Query: 
Sbjct: 

Query: 62 WRTLDADWQQLITRTFTGLTLLDSVQATVGDIAQIKHSQTDHEQVIYANFAFMVAIHARS 121 

WR+L DWQQLITRT+TGLTLLD+VQATVGD+AQI+HSQTDHEQVIY NFAFMV IHARS 
Sbjct: 63 WRSLGEDWQQLITRTYTGLTLLDTVQATVGDVAQIQHSQTDHEQVIYINFAFMVGIHARS 122 

Query: 122 YGTIFSTLCTSQQIEEAHEWWDTESLQARSRILIPFYTGDDPLKSKVAAAMMPGFLLYG 181 

YGTIFSTLC+S+QIEEAHEWW T+SLQ R+R+LIP+YTGDDPLKSKVAAAMMPGFIiYG 
Sbjct: 123 YGTIFSTLCSSEQIEEAHEWVVSTQSLQDRARVLIPYYTGDDPLKSKVAAM#IPGFLLYG 182 

Query: 182 GFYLPFYLSARGKLPNTSDIIRLILRDK^IHNYYSC-YKYQQCTAKLSVEKQAEMKTFVFD 241 

GFYLPFYLSARGK+PNTSDIIRLILRDKVIHNYYSGYKYQQKVA+LS EKQAEMK FVFD 
Sbjct: 183 GFYLPFYLSARGKMPNTSDIIRLILRDKVIHNYYSGYKYQQKVARLSPEKQAEMKAFVFD 242 

Query: 242 LLYQLIDLEKAYLYELYDGFDLAEDAIRFSIYNAGKFLQNLGYDSPFTEEETRISPEVFA 301 

LLY+LIDLEKAYL ELY GFDIAEDAIRFS+YKAGKFLQNLGY+SPFT+EETR+SPEVFA 
Sbjct: 243 LLYELIDLEKAYLRELYAGFDLAEDAIRFSLYNAGKFLQNLGYESPFTDEETRVSPEVFA 302 

Query: 302 QLSARADENHDFFSGNGSSYIMGITEETLDEDWEF 336 

QLSARADENHDFFSGNGSSY+MGITEET D+DWEF 
Sbjct: 303 QLSARADENHDFFSGNGSSYVMGITEETIDDDWEF 337 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 257 

A DNA sequence (GBSx0272) was identified in S.agalactiae <SEQ ID 815> which encodes the amino acid 
sequence <SEQ ID 816>. This protein is predicted to be rhamnosyltransferase. Analysis of this protein 
sequence reveals the following: 



} N- terminal signal sequence 



- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



• Certainty=0. 1741 (Affirmative) . 

Certainty=0. 0000 (Not Clear) < ! 
■ Certainty=0. 0000 (Not Clear) < : 



A related GBS nucleic acid sequence <SEQ ID 9583> which encodes amino acid sequence <SEQ ID 9584> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Sbjct 



Sbjc 
Sbjct 
Sbjc 



11 QINICLATYNGQKYLRQQLDSIIQQGYTDWICLIRDDGSTDDTVAIIKEYVNRDSRFIFI 70 

++NI ++TYNGQ+++ QQ+ SI +Q + +W LIRDDGS+D T II ++ D+R FI 

2 KVNILMST-fNGQEFIAOQIOSlQKQTFENWNLLIRDDGSSDGTPKIIADFAKSDARIRFI 61 

71 NSNDDRKLGSHRSFyELVNTKKADFyVFSDQDDVWKENRLERYLEEAEKFNQELPLLVYS 130 

N++ G ++FY L+ Y+KAD+Y FSDQDDVW +LE L EK N ++PL+VY4 

62 NADKRENFGVIKNFYTLLKYEKADYYFFSDQDDWLPQKLELTLASVEKENNQIPLMVYT 121 



131 NWTSVDEKLTVL KEHNPATVIQEQIAFNQINGMvIMMKIHELAiajWE--YRQIG 181 

+ TVDLVL +H+T + E++N + G +M+NH LAK W+ Y + 

122 DLTVVDFJ3LQVLHDSMIKTQSHHANTSLLEELTE^^IVTGGTM^^VM^CIAKQWKQCYDDLI 181 

182 AHDSYVGTLAYAVGNVAYI SDSTVLWRRQ VGAES LNNYGRQYG - VATFWQMI 232 

HD Y+ LA ++G + Y+ 4-+T L+R+ +GA + L N+ R + V +W ++ 

182 MHDWYIJUjLAASLGKLIYLDETTELYRQHESNTO 241 



293 TILLLTGYG 3 01 
299 KTLIITKFG 3 07 



Based on this analysis, ii 
vaccines or diagnostics. 



s predicted that these proteins and their epitopes could be useful antigens for 



Example 258 

A DNA sequence (GBSx0273) was identified in S.agalactiae <SEQ ID 819> which encodes the amino acid 
sequence <SEQ ID 820>. Analysis of this protein sequence reveals the following: 

50 Possible site: 35 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.19 Transmembrane 1213 -1229 (1211 -1230) 



55 



Final Results 

bacterial membrane Certainty=0 . 2678 (Affirmative) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9581> which encodes amino acid sequence <SEQ ID 9582> 
5 was also identified. 

There is also homology to SEQ ID 822. 

A related GBS gene <SEQ ID 8525> and protein <SEQ ID 8526> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
10 SRCFLG: 0 

McG: Length of II: 3 

Peak Value of UR: 2.28 
Net Charge of CR: 4 
McG: Discrim Score: 1.29 
15 GvH: Signal Score (-7.5): 2.84 

Possible site: 30 
>>> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 31 
ALOM program count: 0 value: 1.16 threshold: 0.0 
20 PERIPHERAL Likelihood = 1.16 344 

modified ALOM score: -0.73 

*** Reasoning Step: 3 

25 Final Results 

bacterial outside --- Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 , LPXTG motif: 1197-1201 

SEQ ID 8526 (GBS 147) was expressed in E.coli as a ffis-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 4; MW 132kDa). 

The GBS147-His fusion product was purified (Figure 200, lane 5) and used to immunise mice. The 
35 resulting antiserum was used for FACS (Figure 286), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 259 

40 A DNA sequence (GBSx0274) was identified in S.agalactiae <SEQ ID 823> which encodes the amino acid 
sequence <SEQ ID 824>. This protein is predicted to be Acetyltransferase (GNAT) family. Analysis of this 
protein sequence reveals the following: 

Possible site: 57 

>» Seems to have no N-terminal signal sequence 

45 

Pinal Results 

bacterial cytoplasm Certainty=0. 2781 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG03505 GB:AE004449 conserved hypothetical protein [Pseudomonas aeruginosa] 
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Identities = 66/143 (46%) , Positives = 94/143 (65%) , Gaps = 5/143 (3%) 

Query: 2 WNVKTFDNLTTHELFQIYKLRVSVFVVEQDCPYQEVDDEDLI--CLHGMNWVI)GQLAAYY 59 

W K +LT EL+ + +LR VFWEQ CPYQEVD DL+ H M W DGQL AY 
Sbjct: 5 WTCKHHADLTLKELYALLQLRTSVFWEQKCPYQEVDGLDLVGDTHHLMAWRDGQLLAYL 64 

Query: 60 RLIP EDDKVHLGRVIVNPDFRKKGLGNQLVEYAIKFSEANYPNKPIYAQAQAYLQDF 116 

RL+ + 4V +GRV+ + R +GLG+QL+E A++ +E + + P+Y AQA+LQ + 

Sbjct: 65 RLLDPV^EGQVVIGRVVSSSAARGQGLGHQLMERALQAAERLWLDTPVYLSAQAHLQAY 124 

Query: 117 YQSFGFQPVSDI YLEDNI PHLDM 139 

Y +GF V+++YI1ED4IPH+ M 
Sbjct: 125 YGRYGFVAVTEVYLEDDIPHIGM 147 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 260 

A DNA sequence (GBSx0275) was identified in S.agalactiae <SEQ ID 825> which encodes the amino acid 
sequence <SEQ ID 826>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2010 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 261 

A DNA sequence (GBSx0276) was identified in S.agalactiae <SEQ ID 827> which encodes the amino acid 
sequence <SEQ ID 828>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2935 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12631 GB:Z99108 similar to RNA methyltransferase [Bacillus subtilis] 
Identities = 217/448 (48%), Positives = 298/448 (66%), Gaps = 4/448 (0%) 

Query: 7 QRIPLKIKRMGINGEGIGFYKKTLIFvPGALKGEEVFCQISSvRRNFAEAKLLKINKKSK 66 

Q PL IKR+GINGEG+G++KK ++FVPGAL GEEV Q + V+ F+E ++ KI K S+ 
Sbjct: 16 QTFPLTIKRLGINGEGVGYFKKKWFVPGALPGEEVWQATKVQPKFSEGRIKKIRKASE 75 



Query: 67 NRVEPPCSIYKECGGCQIMHLQYDKQLEFKTDVIRQALKKFKPEGYSNYEIRKTIGMSEP 126 
+RV PPC +Y++CGGCQ+ HL Y +QL K D++ Q+L + EN EI++TIGM P 
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Sbjct: 


76 






127 


5 


Sbjct: 


136 






186 


10 


Sb j ct : 


196 




Query: 
Sbjct: 


243 
256 


15 




303 


20 


Query: 


363 




Sb j ct : 


376 




Query: 


423 


25 


Sb j ct : 


436 



+YR K QFQ+ RS G++ AGLY +H ++ IKDC+VQ 



4GEVQ++ +T+K +++V + + PE+K++ N 



+N +KTS I+G+ T+ + G+ I E + D F LS RAF+QLNP+QT HE E t 



^ ++DAYCGVGTIG4- A K VRGMD+I E+I DAK+NA G N Y 



+P+W EGFR + +IVDPPRTG D L+TI K+ P++ VYVSCM STIA+DL TL+K 



Y V YIQ VDMFP TA EAV +L K 
DYRVDYIQPVDMFPQTAHVEAVARLVLK 463 

A related DNA sequence was identified in S.pyogenes <SEQ ID 829> which encodes the amino acid 
sequence <SEQ ID 830>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 2980 (Affirmative) < succ> 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 327/450 (72%) , Positives = 397/450 (87%) 

Query: 1 MNVVLKQRIPLKIK^GIKGEGIGFYKKTIjIFVPGALKGEEVFCQISSTORNFAFAICLLK 60 

M V +KQ+IPLKIKRMG1NGEGIGFY+KTL+FVPGALKGE++FCQI++V+RNFAEAKLL 
Sbjct: 1 MVVCTKQKIPLKIKPJ.lGINGEGIGFYQKTLVWPGALKGEDIFCQITAVKRNFAFAltt.LT 60 

Query: 61 INKKSKNRVEPPCSIYKECGGCQIMHLQYDKQLEFKTDVIRQALMKFKPEGYENYEIRKT 120 

+NK SKKRV+P CS+Y+ CGGCQIMHL Y KQti+FK DVIRQAL KFKP GYE +EIR T 
Sbjct: 61 VNKASKNRVKPACSVYETCGGCQIMHLAYPKQLDFKDDVIRQALKKFKPTGYEQFEIRPT 120 

Query: 121 IGMSEPEHYRAKLQFQVRSFGGNVICAGLYAQGTHRLIDIKDCLVQDSLTQEMINRVAELL 180 

+GM +P+HYRAKLQFQ+RSFGG VKRGL++QG+HRL+ I +CLVQD LTQ++IN++ +L+ 
Sbjct: 121 LGMKKPDHYRAKLQFQLRSFGGTVKAGLFSQGSHRLVP IDNCLVQDQLTQDI INKITQLV 180 

Query: 181 GKYKLPIYNERKIAGTOTVMIRRAQASGETVQLIFITSKRLDFDDWIELWEFPELKTVA 240 

KYKLPIYNERKIAG+RT+M+R+AQAS +VQ+I ++SK + + + EL + FP++KTVA 
Sbjct: 181 DKYKIiPIYNERKIAGIRTIMWKAQASDQVQIIWSSKETOIANFIGELTKAFPQVKTVA 240 

Query: 241 VNINASKTSDIYGQITEVIWGQESINEEVLDYGFSLSPRAFYQLNPKQTQILYSEAVKAIj 300 

+N N SK+S+IYG TE+ +WGQE+I +EEVLDYGF+LSPRAFYQ1MP+QT+ +LY E VKAL 
Sbjct: 241 LNSNRSKSSE IYGDETE I LWGQEAI HEE VLDYGFALS PRAFYQLNPQQTEVLYGEWKAL 300 

Query: 301 DVKEDDDLIDAYCGVGTIGI^FAGICVTiSVT.GMDIIPEAIQDAKENALYMGFTNTHYEAGK 360 

DV D +IDAYCGVG+IG AFAGKVKSVRGMDIIPEAI+DA++NA MGF N +YSAGK 
Sbjct: 301 DVGSKDHI IDAYCGVGS IGFAFAGKVKSVRGMDI IPEAIEDAQKNAKAMGFDN&YYBAGK 360 



Query: 361 AEDIIPRWYSEGFRANALIVDPPRTGLDDKLLNTILI<MPPEK^WYVSCNTSTLARDLVTL 420 
65 AEDII +WY +G+RA+A4 IVDPPRTGLDDKLL TIL P++MVYVSCNTSTLARDLV L 
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Sbjct: 361 AEDIISKOTKQGYRADAVIVDPPRTQLDDKLLKTILHyQPKQtWY^/SCNTSTLftRDLVQL 420 

Query: 421 TKVYHVHYIQSVDMFPHTARTFJWVKLQRK 450 

TKVY VHYIQSVDMFPHTARTEAWKLQ++ 
Sbjct: 421 TKVYDVHYIQSVDMFPHTARTEAWKLQKR 450 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 262 

A DNA sequence (GBSx0277) was identified in S.agalactiae <SEQ ID 831> which encodes the amino acid 
sequence <SEQ ID 832>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0 .3505 (Affirmative) ■ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < : 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04643 GB:AP001510 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 74/263 (28%) , Positives = 141/263 (53%) , Gaps = 9/263 (3%) 

Query: 3 ITKIEKKKR LYTLEL-DNTENLY ITEDTIVHFMLSKGMIINAEKLENIKKFAQL 55 

IT+IE +KR Y + + N +++Y + E ++ h KG+ I+AE+++ I ++ 
Sbjct: 4 ITRIEVQKRNIffiRYNIFIHQNGQDVYAFSVDEQVLlKQGLRKGLDIDAEQMKQILYEDEV 63 

Query: 56 SYGKNLGLYYISFKQRTEKEVIKYLQQHD1DSKIIPQIIDNLKSENWINDKNYVQSFIQQ 115 

NL L+Y+S++ R+ EV YL++ D + II ++ L + ++D + ++FIQ 
Sbjct: 64 QKTFNLALHYLSYRMRSVHETOTYLKKTOREEPIIEHVLHRLTEQRLLDDHAFAEAFIQT 123 

Query: 116 NLNTGDKGPYVIKQKLLQKGIKSKIIESELQAINFQDLASKISQKLYKKYQNKLPLKAL- 174 

T KGP +KQ+L +KG+ K IE L ++++ ++ L K+ +h 
Sbjct: 124 KRATTSKGPLKLKQELAEKGVSEKTIEGALTTFSYEEQVEQVKAWLEKQKGRTFKGSSLA 183 

Query: 175 -KDKLMQSLTTKGFDYQIVHTVIQNLEIEKDQELEEDLIYKELDKQYQKLSKKHDQYELK 233 

K KL + L KG+ ++ ++ I++++E E + + +K +K + K +EL+ 

Sbjct: 184 WKQKLSRQLLAKGYTSPVIEEAFADVPIKQEEEEEl'JEALKAFGEKAMRKYAGKKTGWELQ 243 

Query: 234 QRIINALMRKGYQYEDIKSALRE 256 

Q++ AL RKG+ E 1+ L + 
Sbjct: 244 QKVKQALYRKGFSLEMIERYLND 266 

A related DNA sequence was identified in S.pyogenes <SEQ ID 833> which encodes the amino acid 
sequence <SEQ ID 834>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 23 88 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Hot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 146/258 (56%) , Positives = 190/258 (73%) 



Query: 1 MKITKIEKKKKLYTLELDNTENLYITEDTIVHBT4LSKGMIINAEKLENIKKFAQLSYGKN 60 
MKITKIEKKKRLY +ELDN E+LY+TEDTIV FMLSK +++ ++LE++K FAQLSYGKN 
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Sbjct: 1 MKITKIEKKKRLYLIELDNDESLYVTEDTIVRFMLSKDKVLDNDQLEDMKHFAQLSYGKK 60 

Query: 61 LC3LYYISFKQRTEKEVIKYLQQHDIDSKIIPQIIDWLKSFJMINDKNWQSFIQQNLNTG 120 

L LY++SF+QR+ K+V YL++H+I+ II II L+ E WI+D ++I+QN G 

Sbjct: 61 BALYFLSFQQRSNKQVADYLRKHEIEEHIIADIITQLQEEQWIDDTKIADTYIRQNQIjKG 120 

Query: 121 DKGPYVIKQKLLQKGIKSKIIESELQMOTQD^KISQKLYKKYQNIOjPLKALKDKLMQ 180 

DKGP V+KQKLLQKGI S 1+ L +F LA K+SQKL+ KYQ KLP KALKDK+ Q 
Sbjct: 121 DKGPQVLKQKLLQKGIASHDIDPILSQTDFSQIAQKVSQKLFDKYQEKLPPKALKDKITQ 180 

Query: 181 SLTTKGFDYQIVHTVIQNLEIEKDQELEEDLIYKELDKQYQKLSKKHDQYELKQRIINAL 240 

+L TKGF Y + + +L ++D + EDL+ KELDKQY+KLS+K+D Y LKQ++ AL 
Sbjct: 181 ALLTKGFSYDLAKHSLNHLNFDQDNQEIEDLLDKELDKQYRKLSRKYDGYTLKQKLYQAL 240 

Query: 241 MRKGYQYED I KSALREYL 258 

RKGY +DI LR YL 
Sbjct: 241 YRKGYNSDD INCKLRNYL 258 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 263 

A DNA sequence (GBSx0278) was identified in S.agalactiae <SEQ ID 835> which encodes the amino acid 
sequence <SEQ ID 836>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 3912 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04659 GB:AP001510 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 96/175 (54%) , Positives = 122/175 (68%) 

Query: 1 MRLPKEGDFITIQSYKHDGSLHRTWRDTMVLKTTENALIGVNDHTLVTENDGRRWVTREP 60 

M PK G I IQSYKH+GS+HR W +T+VLK T +IG ND hV E+DGR W TREP 
Sbjct: 1 MNFPKVGSKIQIQSYKHNGSIHRIWEETIVLKGTSKV\'IGGI<IDRILVKESDGRHWRTREP 60 

Query: 61 AIvYFHKKYWFNI IAMIRETGVSYYCNLASPYILDPEALKYIDYDLDVKVFADGEKRLLD 120 

AI YF + WFN I MIR G+ +YCNL +P+ D EALKYIDYDLD+KVF D +LLD 
Sbjct: 61 AICYFDSEQWFNTIGMIRADGIYFYCTLGTPFTKDEEALKYIDYDLDIKVFPDMTFKLLD 120 

Query: 121 VDEYEQHKAQMNYPTDIDYILKENVKILVEWINENKGPFSSSYINIWYKRYLELK 175 

DEY H+ M YP +ID IL+ +V LV WI++ KGPF+ ++ WY+R+L+ + 
Sbjct: 121 EDEYAMHRKMMKYPPEIDRILQRSVDELVSWIHQRKGPFAPQFVESWYERFLQYR 175 

A related DNA sequence was identified in S. pyogenes <SEQ ID 83 7> which encodes the amino acid 
sequence <SEQ ID 838>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 3912 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 155/177 (87%) , Positives = 165/177 (92%) 

Query: 1 MRLPKEGDFITIQSYKHDGSLHRTWRDTMVLKTTEHALIGVM)HTLWEMDGRRWVTREP 60 

M+LPKEGDFITIQSYKHDGSLHRTKRDTMVLKTTENALIGV1IDHTLVTE+DGRRWVTREP 
Sbjct: 1 MKLPKEGDFlTIQSYKHDGSlBRTWRrlT^WLKTTENALIG^7]^HTLOTESDGRRWIREP 60 

Query: 61 AIWFHKKYWFNIIAMIRETGVSYYCNLASPYILDPEALKYIDYDLDVKVFADGEKRI1I1D 120 

AIVYFHKKYWFNIIAMIR4 GVSYYCNLASPY++D EALKYIDYDLDVKVFADGEKRLLD 
Sbjct: 61 AIWFHKICYWFNIIAMIRDNGVSYYCNIASPYI-IMDTEALKYIDYDLDVKVFADGEKRLLD 120 

Query: 121 VDEYEQHKAQMNYPTDIDYILKEN\ r KILVZW:NENKGPFSSSYINIWYKRYLELKKR 177 

VDEYE HK +M Y D+D+ILKENVKILV+WIN RGPFS +YI IWYKRYLELK R 
Sbjct: 121 VDEYEIHKKEMQYSADMDFILKENVKILVaWINHEKGPFSKAYITIWYKRYLELKNR 177 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 264 

A DNA sequence (GBSx0288) was identified in S.agalactiae <SEQ ID 839> which encodes the amino acid 
sequence <SEQ ID 840>. This protein is predicted to be jag protein. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1666 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB07782 GB:AP001520 spoIIIJ-associated protein [Bacillus halodurans] 
Identities = 54/1S8 (27%) , Positives = 98/198 (49%) , Gaps = 6/198 (3%) 

Query: 100 DWEEYIEEVDETLEKEDVSQPELPKIDDKNV\TTSEiIEKIDLLPNIEVAAAQVTKYVE 159 

+ VE+ I E+ T E+ + E PK ++ + A+ ++ + P+ + ++E 

Sbjct: 13 EAVEQAIIELGTTRERITYTWEEPKSGLFGILGSKPAVIEVWKPD PVDRAKAFLE 69 

Query: 160 NIIYEMDLDA- -TIETTTSKRQINLQIETPEftGRIIGYHGKVLKSLQLLAQNYLHDRFSK 217 

++ EMD++ TIE + N+ E + G +IG G+ L SLQ L + + 

Sbjct: 70 ELLQEMDMEvEVTIEKDPATVLFNISGEQ-DLGTLIGKRGOTLDSLQYLVNLVANKEEGE 128 



Query: 278 ESYSEGNDPNRFWVTKK 295 

E+YSEG R W+ K 
Sbjct: 189 ETYSEGQGIGRHWIAPK 206 

A related DNA sequence was identified in S.pyogenes <SEQ ID 84 1> which encodes the amino acid 
sequence <SEQ ID 842>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm — - Certainty=0. 3721 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 176/302 (58%), Positives = 223/302 (73%), Gaps = 32/302 (10%) 

Query: 23 MVLFTGATVEEAIEKGLQEl.NISRLRAHIKWSREKKGFLGF'GKKPAICVEIEGITDEVTD 82 

MVLFTG TVEEAIE GLQEL +SRL+AHIKV+S+EKKGFLGFGKKPA+V+IEGI+D+ 
Sbjct: 1 MVliFTGKTVEEAIETGLQELGLSRIjKAHIKVISKEKKGFLGFGKKPAQVDIEGISDKrVY 60 

Query: 83 INESVALKNI KHVPS- -SVTJVVEEYrEEVTJETLEKEDVSQPELPKIDDK 129 

+ A + + +N P+ S DV E 1+ + LE ED L D 

Sbjct: 61 KOTKKATRGVPEDINRQNTPAWSMIVEPEEIKAT-QRLEAEDTKVVPLMSEDSPAQTPS 119 

Query: 130 NWTTSEA IEKIDL LPNIEVaAAQVTKYVENIIYEMDLDATI 171 

VT ++A +E+ ++ +IE AA +V+ YV IIYEMD++AT+ 

Sbjct: 120 MjAETVTETKAQQPSIPVEESEVPQDP^IDGFSKDIEKAAQEVSDYVTKIiyEMDIEATV 179 



Query: 172 ETTTSK 

ET+ ++RQINLQIETPEAGR+IGYHGKVLKSLQLIAQN+LHDR+SK4-FSVS+NVHDWEH 
Sbjct: 180 ETSNNRRQINLQIETPEAGRVIGYHGKVLKSLQLIAQNFLHDRYSKHFSVSLtJVHDYVEH 239 



Sbjct: 240 RTETLIDFTQKVAfCRVLESGQDYTMDPMSNSERKIVHKTVSSlEGVDSYSEGNDPlJRYW 299 

Query: 292 VT 293 
V+ 

25 Sbjct: 300 VS 301 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 265 

30 A DNA sequence (GBSx0290) was identified in S.agalactiae <SEQ ID 843> which encodes the amino acid 
sequence <SEQ ID 844>. This protein is predicted to be 60 kd inner-membrane protein (yidC). Analysis of 
this protein sequence reveals the following: 

Possible site: 42 

>» May be a lipoprotein 

35 INTEGRAL Likelihood = -7.38 Transmembrane 54 - 70 ( 52 - 75) 

INTEGRAL Likelihood = -5.20 Transmembrane 193 - 209 ( 192 - 211) 

INTEGRAL Likelihood = -3.61 Transmembrane 125 - 141 ( 124 - 144) 

INTEGRAL Likelihood = -2.44 Transmembrane 168 - 184 ( 167 - 184) 

40 Final Results 

bacterial membrane — - Certainty=0. 3951 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA78595 GB:Z14225 SpoIIIJ [Bacillus subtilis] 
Identities = 79/243 (32%) , Positives = 142/243 (57%) , Gaps = 5/243 (2%) 

Query: 1 MKKKLKTFSLILLTGSLLVACG- -RGEVSSHSATIjWEQ- IVYAFAKSIQWLS- - FMHSIG 55 
50 ' MK+++ ++ LL C + +++ S W++ +VY ++ I + G 

Sbjct: 1 MKRRIGLLLSMVGVFMLIAGCSSVKSPITADSPHFTO3!<YWYPLSELITYVAKLTGDNYG 60 

Query: 56 LGIILFTLIIRAIMMPLYNMQMKSSQKMQEIQPRLKELQKKYreKDPDlffiLKIJSDEMQSM 115 
L IIL T++IR +++PL Q+4-SS+ MQ +QP +++L++KY KD + KL E ++ 
55 Sbjct: 61 LSIILOTILIRLLILPLMIKQLRSSKAMQALQPEMQKLKEKYSSKDQKTQQKLQQETMAL 120 

Query: 116 YKAEGVHPYASVLELLIQLEVLmLFQALTRVSFLKVGTFLSLELSQPDPYyiLPVlAAL 175 

++ GVWP A P+LIQ+P+L + A+ R + +FL +L ■(■ DPYYILP++A + 
Sbjct: 121 FQKHGVNPLAGCFPILIQMPILIGFYHAIMRTQAISEHSFLWFDLGEKDPYYILPIVAGV 180 

60 

Query: 176 FTFLSTWLTOl^vEKinALT^^ 235 
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Query: 236 NNP 238 
P 

Sbjct: 241 KGP 243 

A related GBS sequence was identified <SEQ ID 10783> which encodes amino acid sequence <SEQ ID 
10784>. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 845> which encodes the amino acid 
sequence <SEQ ID 846>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> May be a lipoprotein 

INTEGRAL Likelihood = -6.32 Transmembrane 198 - 214 ( 197 - 220) 

INTEGRAL Likelihood = -5.52 Transmembrane 59 - 75 ( 57 - 80) 

INTEGRAL Likelihood = -4.25 Transmembrane 130 - 146 ( 129 - 150) 

Likelihood = -2.28 Transmembrane 173 - 189 ( 170 - 189) 



20 Final Results 

bacterial membrane Certainty= 0.3 52 7 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:BAA05234 GB:D26185 stage III sporulation [Bacillus subtilis] 
Identities = 90/249 (36%) , Positives = 150/249 (60%) , Gaps = 6/249 (2%) 

IVPLVLLLVACG--RGEVTAQSSSGWDQ-LVYLFARAIQWLS--FDGSIGVGIILFTLTI 70 
+V + +LL C + +TA S WD+ +VY + I +++ + G4 IIL T+ I 

MVGWML^GCSSVKEPITADSPHFWDKyWYPLSELITYVAKLTGDNYGLSIILVTILI 72 



RL+++PL Q++SS+ MQ +QPE++4L+ KY+ KD +T+ KL +E+ AL++K+GVNP A 





16 


Sbjct: 


13 




71 


Sbjct: 


73 


Query: 


131 


Sbjct: 


133 






Sbjct: 


193 




250 


Sbjct: 


253 



+FLW +L + D Y4LP++A V TF+ 



RQRLANEEK 258 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 172/270 (63%) , Positives = 217/270 (79%) , Gaps = 1/270 (0%) 

Query: 1 MKKKLKTFSLILLTGSLLVACGRGEVSSHSATLWEQIVYAFAKSIQWLSFNHSIGLGIIL 60 

+KK +K ++ L LLVACGRGEV++ S++ W+Q+VY FA++IQWLSF+ SIG+GIIL 
Sbjct: 7 VKKNIKIARIVPLV-LLLVACGRGEVTAQSSSGWDQLVYLFARAIQWLSFDGSIGVGIIL 65 

Query: 61 FTLIIRAI^PLYNMQMKSSQKMQEIQPRLKELQKKYPGKDPDNRLKLNDEMQSMYKAEG 120 

FTL IR ++MPL+NMQ+KSSQKMQ+IQP L+ELQ+KY GKD R+KL +E Q++YK G 
Sbjct: 66 FTLTIRLMLMPLFNMQIKSSQKl'lQDIQPELRELQRKYAGKDTQTRMKLAEESQALYKKYG 125 

Query: 121 VNPYASVLPLLIQLPVLWALFQALTRVSFLKVGTFLSLELSQPDPYYILPVLAALFTFLS 180 

VNPYAS+LPLLIQ+PV+ ALFQALTRVSFLK GTFL +EL+Q D Y+LPVLAA+FTFLS 
Sbjct: 126 VNPYASLLPLLIQMPVMIALFQALTRVSFLKTGTFLIWELAQHDHLYLLPVLAAVFTFLS 185 
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Query: 181 TWLTNKAAVEKNIALTLMTYVMP F 1 1 LVTS FKFASG V^ YWTVSNAFQVFQILLLNNPYK 240 

TOLTN AA EKN+ +T+M YVMP 4-1 FN ASGWLYWTVSNAFQV Q+LLLNNP+K 
Sbjct: 186 TWLlOTiAAKEKNWMTVMIYVMP^ 245 

Query: 241 IIKVREEAVRVAHEKEQRVKRAKRKASKKR 270 

II R+ E+ R +RA++KA K++ 

Sbjct: 246 I IAERQRLANEEKERRLRERRARKKAMKRK 275 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8527> and protein <SEQ ID 8528> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 20 Crend: 5 
McG: Discrira Score: 4.90 
GvH: Signal Score (-7.5): -0.39 

Possible site: 42 
>=■> May be a lipoprotein 

ALOM program count: 4 value: -7.38 threshold: 
INTEGRAL Likelihood = -7 
INTEGRAL Likelihood = -5 
INTEGRAL Likelihood = -3 
INTEGRAL Likelihood = -2 
PERIPHERAL Likelihood = 2 
modified ALOM score: 1.98 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3951 (Affirmative) <: 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < s 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < £ 

The protein has homology with the following sequences in the databases: 

32.8/62.3% over 242aa 



Transmembrane 54 - 70 ( 52 - 75) 

Transmembrane 193 - 209 ( 192 - 211) 

125 - 141 ( 124 - 144) 

168 - 184 ( 167 - 184) 
217 



jillUE 



EGAD|17722| stage III sporulation protein j precursor Insert characterized 
OMNI |NT01BS4782 -identity Insert characterized 

SP|Q01625|SP3J_BACSU STAGE III SPORULATION PROTEIN J PRECURSOR. Edit characterized 

GP]40023|emb|CAA44401.l| |X62539 unnamed protein product Insert characterized 

GP j 467388 |dbj |BAA05234.l| |D26185 stage III sporulation Insert characterized 

GP| 2636651 |emb|CAB16141.l| |Z99124 alternate gene name: spo0J87 Insert characterized 

PIR| 140437 | 140437 stage III sporulation protein spoIIIJ - Insert characterized 

ORF02221(301 - 1014 of 1413) 

EGAD | 17722 | S4098 (3 - 245 of 261) stage III sporulation protein j precursor { acillus 
subtilis}OMNl|NT01 S4782 -identitySP | Q01625 |SP3J_ ACSU STAGE III SPORULATION PROTEIN J 
PRECURSOR . GP ( 40023 | emb j CAA44401 . 1 1 | X62539 unnamed protein product { acillus 
subtilis}GP|467388|dbj| AA05234 . 1 | |D26185 stage III sporulation { acillus 
subtilis}GP|263665l|emb|CA 16141 . 1 [ | Z99124 alternate gene name: spoOJ87 { acillus 
subtilis}PIR| 140437 j 140437 stage III sporulation protein spoIIIJ - acillus subtilis 
%Match =17.0 

%Identity =32.8 %Similarity =62.2 

Matches = 79 Mismatches = 88 Conservative Sub.s = 71 

219 249 279 309 339 393 420 

DFWIARKGVEEII3YQALEKNLIHVLKIAGLI*KGIKLKKKLKTFSLILLTGSLLVACG--RGEVSSHSATLWEQ-IVYA 
= II- = == 111= =:: I >|:: =11 
MLLKRRIG^LSKVGVFMLIiAGCSSVKEPITADSPHFWDKXVVYP 



474 504 534 564 594 624 654 

FAKSIQWLS--FNHSIGLGIILFTLIIRAIMMPLYM4QMKSSQKMQEIQPRLKELQ 
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:== I === « II III |::|| s::|| |::||: II =11 :::|:s|| II = II I :::: 

LSELITYVftKLTGDNYGLSIILVTILIRLLILPLMIKQLRSSKAMQALQPEMQKLKEKnfSSKDQKTQQKLQQETMALFQK 
SO 70 80 90 100 110 120 

5 684 714 744 774 804 834 864 894 

EGVWPYASVLPLLIQLPVLWALFQALTRVSFLKVGTFLSLELSQPDPYyiLP^T^AALFTFLSTWLTNEOUWEKNIALTLM 
Mil I :|:|||:|:| = = :|: I = =11 ::| = lllllll-l : 11= I ::| = ■ I 

HGWPLAGCFPILIQMPILIGFYEffilMRTQAISEHSFLWFDLGEKDPYYILPIVAGVATFVQQKLMMAGMAQQNPQMAMM 
140 150 160 170 180 190 200 

10 

924 954 984 1014 1044 1074 1104 1134 

TYVMPFI ILVTSFNFASGWLYWTVSNAFQVFQILLLNNPYKI I KVREFATOVAHEKEQRVKRAKRKASKKRK*EKHGI I 

::|| :|:| : | | : : | | | | | | : | : | : | 
LWIMPIMI1VFAINFPAALSLYWWGNLFMIAQTFLIKGPDIKKNPEPQKAGGKKK 
15 220 230 240 250 260 

Example 266 

A DNA sequence (GBSx0291) was identified in S.agalactiae <SEQ ID 847> which encodes the amino acid 
sequence <SEQ ID 848>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
20 >» Seems to have no H- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3778 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < euco 

25 bacterial outside Certainty=0. 0000 (Not Clear) <: suco 

A related GBS nucleic acid sequence <SEQ ID 9579> which encodes amino acid sequence <SEQ ID 9580> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:CAA44400 GB:X62539 homologous to E.coli rnpA [Bacillus subtilis] 

Identities = 52/109 (47%) , Positives = 77/109 (69%) , Gaps = 1/109 (0%) 

Query: 21 LKKTYRVKSDKDFQMIFSRGKNVANRKFVIYYLEK-EQKHFRVGISVSKKLGNAWRNAI 79 
LKK R+K ++DFQ +F G +VANR4FV4Y L++ E RVG+SVSKK+GNAV+RN I 
35 Sbjct: 4 LKKRNRLKKNEDFQKVFKHGTSVANRQFVLYTLDQPENDELRVGLSVSKKIGmVMRNRI 63 

Query: 80 KRKIRHVLLSQKTALQDYDFWIARKGVEELDYQALEKNLIHVLKIAGL 128 

KR IR L +K L++ D+++IARK +L Y+ +K+L H+ 4 + L 
Sbjct: 64 KRLIRQAFLEEKERLKEKDYIIIARKPASQLTYEETKKSLQHLFRKSSL 112 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 849> which encodes the amino acid 
sequence <SEQ ID 850>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0. 3820 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 73/109 (66%), Positives = 88/109 (79%) 

Query: 21 LKKTYRVKSDKDFQMIFSRGK1WANRKFVIYYLEKEQKHFRVGISVSKKLGNAWRNAIK 80 
55 LKKTYRVK +KDFQ IF GK+ ANRKFVIY+L + Q HFRVGISV KK+GNAV RNA+K 

Sbjct: 1 LKKTYRVKREKDFQAIFKDGKSTANRKEVIYHLNRGQDHFRVGISVGKKIGNAVTRNAVK 60 

Query: 81 RKIRHVLLSQKTALQDYDFWIARKGVEELDYQALEKNLIHVLKIAGLI 129 
RKIRHV+++ L+ DFWIARKGV L+YQ L++NL HVLK+A L+ 
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Sbjct: 61 RKIRHVIMRLGHQLKSEDFWIARKGVIiaLEYQELQQNLHHVLKLAQLL 109 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 267 

A DNA sequence (GBSx0292) was identified mS.agalactiae <SEQ ID 851> which encodes the amino acid 
sequence <SEQ ID 852>. This protein is predicted to be glycerol-3-phosphate dehydrogenase, NAD- 
dependent (gpsA). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 1429 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — - Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8529> which encodes amino acid sequence <SEQ ID 8530> 
was also identified. There is a signal peptide at residues 1-19. The protein has homology with the following 
sequences in the GENPEPT database: 

>GP-.AAA86746 GB-.U32164 NAD (P) H- dependent dihydroxyacetone -phosphate 
reductase [Bacillus subtilis] 
Identities = 177/333 (53%) , Positives = 241/333 (72%) 

Query: 18 QKIAVLGPGSWGTAIAQVMIDNGHEVRDWGNVVEQIEEINTNHTNQRyFKDITLDSKIKA 77 

+K+ +LG GSWGTALA VL DNG+EV +W + + I +IN H N+ Y ++ I> + IK 
Sbjct: 2 KKOTMLGAGSWGTALALVLTDNGNEVCVWAHRADLITO 51 

Query: 78 YTNLEEAINNVDSILFWPTKVTRLVAKQV^ 137 

T+++ EA+++ D 1+ VPTK R V +Q + K V +H SKG+EP + R+S I+E 
Sbjct: 62 TTDMKEAVSDADVI IVAVPTKAIREVLRQAVPFITKKAVFVHVSKGIEPDSLLRISEIME 121 

Query: 138 EEISEQYRSDIVWSGPSHAEEAITODITBITA^SKDIEAAKTVQKLFSNHYFRLYTNTD 197 

E+ R DIW+SGPSHAEE +R T +TA+SK + AA+ VQ LF NH FR+YTN D 
Sbjct: 122 IELPSDWRDIVVLSGPSHAEEVGLRHATTVTASSKSMRAAEEVQDLFINHNFRVYTNPD 181 

Query: 198 WGVETAGALKNIIAVGAGALHGLGYGDNAKaAIITRGIAEITRLGVQLGADPLTFSGLS 257 

++GVE GALKNIIA+ AG GLGYGDNAKAS.+ITRGLAEI RLG ++G +PLTFSGL+ 
Sbjct: 182 IIGVEIGGALKNIIALAAGITDGLGYGDNAKAALITRGLAEIARLGTKMGGNPLTFSGLT 241 

Query: 258 GVGDLIVTGTSWSRW^GDALGRGEKLEDIEKNMGJWIEGISTTKVAYEIAQNLNVYM 317 

GVGDLIVT TSVHSRNWRAG+ LG+G KLED+ + MGMV+EG+ TTK AY++++ +V M 
Sbjct: 242 GVGDLITCCTSVHSRNWRAGNLLGKGYKLEDVLEEMG^1VVEGTOTTKAAYQLSKKYDVKM 3 01 



Query: 318 E 

PITEA+++ ++ G ++ +4 +M+ E E 

Sbjct: 302 PITEALHQVLFNGQKVETAVESLMARGKTHEME 334 

A related DNA sequence was identified in S.pyogenes <SEQ ID 853> which encodes the amino acid 
sequence <SEQ ID 854>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm — Certainty=0. 0882 (Affirmative) < suco 
bacterial membrane — - Certainty=0 .0000 (Not Clear) < suco 
bacterial outside — Certainty=0.0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 



WO 02/34771 



-335- 



PCT/GB01/04789 



Identities = 287/338 (84%) , Positives = 316/338 (92%) 

Query: 15 MTKQKIAVLGPGSWGTAIAQVLNDNGHEVELWGNVVEQIEEINTNHTNQRYFKDITLDSK 74 

MTKQK+A+LGPGSWGTAL+QvX^NGH+VRLWGN+ +QIEEINT HTN+ YFKDI LD 
Sbjct: 1 MTKQKVAILGPGSWGTALSQVLNDNGHDVRLWGNIPDQIEEIOTIOITNRHYFKDIVLDKN 60 

Query: 75 IKAYTNLEFAINNVDSILFWPTKOTRLVAKQVAlttLKHKVVLMHASKGLEPGTHERLST 134 

I A +L 4A+++TO++LFWPTKVTRLVA+QWL +L HKW+MHASKGLEP THERLST 
Sbjct: 61 ITATLDLGQALSDVDAVLFWPTKVTRLVARQVAAILDHKVVVMHASKGLEPETHERIiST 120 

Query: 135 ILEEEISEQYRSDIVWSGPSHAEEAIVRDITLITAASKDIEAAKYVQKLFSNHYFRLYT 194 

ILEEEI +RS++WVSGPSHAEE IVRDITLITAASKDIEAAKYVQ LFSNHYFRLYT 
Sbjct: 121 ILEEEIPAHFRSE^An7VSGPSHAEETIVRDITI J ITAASKDIEAAKYVQSLFSHHYFRLYT 180 

Query: 195 NTDWGVETAGALKNIIAVGAGALHGLGYGDNAKaAIITRGLAEITRLGVQLGADPLTFS 254 

NTDV+GVETAGALKNI IAVGAGALHGLGYGDNAKAA+ ITRGLAEITRLGV+LGADPLT+S 
Sbjct: 1B1 IWDVIGVETAGALKNIIAVGAGALHGLGYGDNAKAAVITRGLAEITRLGVKLGADPLTYS 240 



Query: 315 VYMPITEA.IYKSIYEGANIKDSILDMMSKEFRSENEWH 352 

VYMPIT AIYKS I YEGA+ IK+S IL MMSNEFRSENEWH 
Sbjct: 301 VYMPITTAI YKS I YEGADI KES ILGMMSNEFRSENEWH 338 

SEQ ID 8530 (GBS291) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 59 (lane 5; MW 38.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 2; MW 64kDa). 

GBS291-GST was purified as shown in Figure 226, lane 10-11. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 268 

A DNA sequence (GBSx0293) was identified in S.agalactiae <SEQ ID 855> which encodes the amino acid 
sequence <SEQ ID 856>. This protein is predicted to be glucose-1 -phosphate uridylyltransferase (gtaB). 
Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA28714 GB:AB001562 glucose-l-phosphate uridylyltransferase 
[Streptococcus mutans] 
Identities = 263/296 (88%) , Positives = 285/296 (95%) 

Query: 2 KTOKAVIPARGLGTRFLPATKALAKEMLPIVDKPTIQFIVEEALKSGIEDILVVTGKSKR 61 

KVl^KAVIPAAGLGTRFLPATKAIiAKEMIjPIVDKPTIQFIVEEALICSGIEDILVVTGKSKR 
Sbjct: 5 KTOKAVIPAAGLGTRFLPATKALAKEMLPIVDKPTIQFIVEEALKSGIEDILVVTGKSKR 64 

Query: 62 SIEDHFDSNFELEYmKEKGKNELLKLVUETTGIRLHFIRQSHPRGLGDAVLQAKAFVGN 121 

SIEDHFDSNFELEYNL++KGK +LLKLV++TT I LHFIRQSHPRGLGDAVLQAKAFVGN 
Sbjct: 65 SIEDHFDSNFELEYNLEQKGKTDLLKLViroTTAriffiHFIRQSHPRGLGDAVLQAKAFVGN 124 

Query: 122 EPFVVMLGDDLMDITNNKVIPljTKQLIIfflFFATHASTIAV^lEVPHEDVSAYGVIAPC^EG 181 
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EPFVVMLGDDLMDIT++K IPLT+QL+ND+E THASTIAVMEVPHEDVSAYGVIAPQGEG 
Sbjct: 125 EPFWMLGDDLMDITDDKAIPLTRQLMNDY3ETHASTIAVMEVPHEDVSAYGVIAEQGEG 184 

Query: 182 VWGLYSWFVEKPSPEEAPSNIAIIGRYLLTPEIFNILETQKPGAGNEIQLTDAIDTLN 241 
5 V+GLYSV+TFVEKP+P+FAPSNLAIIGRYLLTPEIF ILETQ+PGAGNE+QLTDAIDTLN 

Sbjct: 185 VSGLYSVDTFVEKPAPKERPSNLAIIGRYLLTPEIFTILETQEPGAGNEVQLTDAIDTLN 244 

Query: 242 KTQRVFARKFTGDRYDVGDKFGFMKTS IDYALQHPQVKDDLKKYI IDLGKSLEKTS 297 
KTQRVFAR+F G RYDVGDKFGFMKTS IDYAL+HPQVK+DLK YII+LGK L++ S 
10 Sbjct: 245 KTQRVFAREFKGKRYDVGDKFGFMKTS IDYALKHPQVKEDLKAYI IELGKKLDQKS 300 



A related DNA sequence was identified in S.pyogenes <SEQ ID 857> which encodes the amino acid 
sequence <SEQ ID 858>. Analysis of this protein sequence reveals the following: 



n signal seq. 

- Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 257/295 (87%) , Positives = 277/295 (93%) 

KVRKAVIPAAGLGTRFLPATKAIAKEMLPIVDKPTIQFIVEEALKSGIEDILVVTGKSKR 61 
KVRKA+IPAAGLGTRFLPATKALAKEMLPIVDKPTIQFIVEEALKSGIE+ILVVTGK+KR 
KVRKAIIPAAGLGTRFLPATKALAKEMLPIVDKPTIQFIVEEALKSGIEEILVVTGKAKR 62 

SIEDHFDSNFELEYNLKEKGKNELLKLVDETTGIRLHFrRQSHPRGLGDAVLQAKAFVGN 121 
SIEDHFDSNFELEYNL+ KGKNELLKLVDETT I LHFIRQSHPRGLGDAVLQAKAFVGN 



Query: 


2 


Sbjct: 


3 


Query: 


62 


Sb j ct : 


63 


Query: 


122 


Sbjct: 


123 


Query: 


182 


Sb j ct : 


183 




242 


Sbjct: 


243 



EPFWMLGDDLNDITN PLTKQL+ D++ THASTIAVM+VPHEDVS+YGVIAPQG+ 
EPFVVMLGDDLMDITNASAKPLTKQLMEDYDKTHASTIAVMKVPHEDVSSYGVIAPQGKA 182 

VNGLYSVNTFVEKPS PEEAPSNLAI IGRYLLTPEI FNI LETQKPGAGNE I QLTDAIDTLN 241 
V GLYSV+TFVEKP PE+APS+LAI IGRYLLTPEIF ILE Q PGAGNE+QLTDAIDTLN 
VKGLYSVDTFVEKPQPEDAPSDIAIIGRYLLTPEIFGILERQTPGAGNEVQLTDAIDTLN 242 

KTQRVFARKFTGDRYDVGDKFGFMKTSIDYALQKPQVICDDLKICYIIDLGKSLEKT 296 
KTQRVFAR+F G+RYDVGDKFGFMKTSIDYAL+HPQVK+DLK YII LGK+LEK+ 
KTQRVFAREFKGNRYDVGDKFGFMKTSIDYALEHPQVICEDLKNYIIKLGKALEKS 297 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 269 

A DNA sequence (GBSx0294) was identified in S.agalactiae <SEQ ID 859> which encodes the amino acid 
sequence <SEQ ID 860>. Analysis of this protein sequence reveals the following: 

50 Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.94 Transmembrane 28 - 44 ( 27 - 45) 



Final Results 

55 bacterial membrane --- Certainty=0 .2975 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAB15143 GB:Z99120 similar to ABC transporter (lipoprotein) 
[Bacillus subtilis] 
s = 148/346 (42%) , Positives = 222/346 (63%) , Gaps = 16/346 (4%) 

LTLLSLSVLTLTACGNRSDKSAN KSDIKVAMVTNQGGVDDKSFNQSAWEGLQKWGKK 87 

++L+ + L ACGN S + K+ VAMVT+ GGVDDKSFNQSAWEG+Q +GK+ 
MSLVIAAGTILGACGNSEKSSGSGEGKK-KFSVAMVTDVGGVDDKSFNQSAWEGIQAFGKE 60 



Ident: 




Query: 


31 


Sb j ct : 


1 
88 


Query: 
Sb j ct : 


61 


Query: 


147 


Sb j ct : 


121 






Sbjct: 


180 


Query: 


267 


Sbjct: 


237 


Query: 


325 


Sbjct: 





GIj KG NG+DY QS +++D+ NLt A ++LI+G+G+ 1 



K+NVAS+TF + E ++L GVAAA ++K-r +GF+GGME ++K+FE GF+AG 



r++++P V V YAG F A GK A + Y +GVDVIY 4 



+KE K VWVIGVD+DQ 



There is also homology to SEQ ID 862. 

A related GBS gene <SEQ ID 853 1> and protein <SEQ ID 8532> were also identified. Analysis of this 
protein sequence reveals the following: 



McG: Length of UR: 19 

Peak Value of BR: 2.31 

Net Charge of CR: 2 
McG: Discrim Score: 5.09 
GvH: Signal Score (-7.5): -3.29 

Possible site: 19 
»> May be a lipoprotein 

Amino Acid Composition: calculated from 21 
ALOM program count: 0 value: 5.20 threshold: 0.0 
PERIPHERAL Likelihood =5.20 90 
modified ALOM score: -1.54 

*** Reasoning Step: 3 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Wot Clear) < suco 

bacterial outside — Certainty=C . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the 

52.8/73.9% over 239aa 

Listeria 

monocytogenes 

SP|Q48754| CD4+ T CELL- STIMULATING ANTIGEN PRECURSOR. Insert characterized 
GP|724060l|gb|AAB35725.2| [S80336 CD4+ T cell -stimulating antigen Insert characterized 

ORF02225(385 - 1086 of 1710) 

SP|Q48754|TCSA_LISMO(8 - 247 of 268) CD4+ T CELL-STIMULATING ANTIGEN 
PRECURSOR. GP | 7240601 |gb |AAB3 5725 - 2 | | S80336 CD4+ T cell-stimulating antigen {Listeria 
monocytogenes} 
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%Match =21.7 

%Identity =52.7 %Sirailarity =73.8 

Matches = 125 Mismatches = 59 Conservative Sub.s = 



NFLWEK*NKVC*MIFLCYDRNLFLCDYNLLGGSFSV]NRKIIGLTLLSL8VLTLTACGNRSD KSANKS--DIKVAMVT 

= l = = = I : I 111= II I =11 I Hill 

MKKRTFALALSMIIASGVILGACGSSSDDKKSSDDKSSKDFTVAMVT 



NQGGVDDKSFNQSAWEGLQKWGKKKGLTKG-NGFDYFQSSNESDHANWIJDTAASSGYNLIFGIGFGLHDTIEKVSENNKD 
: I! 111 = 1 III 11111111 = 11 = II :|::| = ||::|:|: ||:|| I |:||:|||: I I ||:||: 
DTGGVDDRSFNQSAWEGLQKFGKAITOMEKGTDGYK^LQSASEADYKTNLNTAVRSDYDLIYGIGYKLKDAIEEVSKQKPK 



VKWIVDDIIKGKE^^aSVTFADNEAAYIaAGVAAAKTTKTKTOGFIGGMEG\WKRFEAGFKAGVKSIDPAlKVAVSYAG 

= = mi i = = n i= i ii= =ii ii i mi iii = ii = = i i = nun iiii = = = i == i ii 

WQFAIVDDTIDDRDNWSIGFKDNDGSYLVGWAGLTTKTNKVGFVGGVKGTVIDRFEAGFTAGVKAVNPWAQIDVQYAN 
140 150 160 170 180 190 200 

996 1026 1056 1086 1116 1146 1176 1206 

SFTDAAKGKTIAATQYATGVDVIYQAAGGTGAGIFSEAKTENETRKESNKVWIGVDRDQSQEGNWSKDGKKANFVLAS 

I I 11= ll= = l = = lllll = = llllll 1 = 1 = 111 = = 
DFAKADKGQQIASSMYSSGVDVIFHAAGGTGNGVFAEAKNLKKKDLQMVPYGNSKLGCFGG 
220 230 240 250 260 

A related GBS nucleic acid sequence <SEQ ID 10947> which encodes amino acid sequence <SEQ TD 
10948> was also identified. 

SEQ ID 8532 (GBS 108) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 7; MW 39.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 9; MW 64.6kDa). 

The GBS108-GST fusion product was purified (Figure 202, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 273), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 270 

A DNA sequence (GBSx0295) was identified in S.agalactiae <SEQ ID 863> which encodes the amino acid 
sequence <SEQ ID 864>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.74 

INTEGRAL Likelihood = -3.72 

INTEGRAL Likelihood = -3.19 

INTEGRAL Likelihood = -1.54 

INTEGRAL Likelihood = -0.90 Transmembrane 157 - 173 ( 157 - 1731 

Final Results 

bacterial membrane Certainty=0 . 6095 (Affirmative) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) < : 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 



55 The protein has homology with the following sequences in the GENPEPT database: 
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Query: 8 KEYPTTVLLVSLTTLVFLLMQLTYGSQAESSQ\TFQFC-GIQGDYLKRYPTNLWRLISPIF 67 

KE P T +S+T L+F++MQ+ YGS A+S QV+FQFGG+ G +K+ P+ LWRL++PIF 
Sbjct: 5 KEKPVTFFFLSVTILLFIVMQVFYGSWAKSPQWFQFGGMFGLWKSMPSQLWRLVTPIF 64 

Query: 68 VHIGWEHFLLKGLALYFVGQMGESIWGSLRFLILYILSGLMGNIFTLFFTPHWAAGAST 127 

+HIGWEHFL+N L LYFVGQ+ ESIWGS FL+LY+LSG+MGN+ TLFFTPHWAAGAST 
Sbjct: 55 IHIGWEHFLIKSLTLYFVGQLAESIWGSRFFLLLYVLSGIMGNVLTLFFTPHWAAGAST 124 

Query: 128 SLFGVFSAIAIAGYFGKNPYLKQVGKSYQVMI LLNLFFNI FTPGVSLAGHVGGLVGGVLV 187 

SliFG+F+AI + GYFG N LK +GKSYQ +I+LNL N+F P V + GH+GG +GG L 
Sbjct: 125 SLFGLFAAIVWGYFGHNQLLKSIGKSYQTLIIIjNLVKNLFMPNVGlVGHLGGALGGALA 184 

Query: 188 AI FLTKQNGSLLFKTWQSILALMI FI IVS ISLIGLSLV 225 

A+FL + LF Q AL+ ++ +++ LI LSL+ 

Sbjct: 185 AVFLPTLLDAELFTKKQKTSALLSYLTLALVLITLSLM 222 

A related DNA sequence was identified in S.pyogenes <SEQ ID 865> which encodes the e 
sequence <SEQ ID 866>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.92 Transmembrane 214 - 230 ( 212 - 232 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = ■ 

INTEGRAL Likelihood = ■ 



Transmembrane 135 - 151 ( 128 - 153 

101 - 117 ( 100 - 117 

183 - 199 ( 182 - 199! 

53 Transmembrane 166 - 182 ( 166 - 182! 



30 Final Results 

bacterial membrane — Certainty=0 .4970 (Affirmative) • 

bacterial outside — Certainty=0 . 0000 (Not Clear) < * 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < i 

35 The protein has homology with the following sequences in the databases: 



Query: 106 FLLLYVLSGVMGNAFTFWLTPETVARGASTSLFGLFAAIVVLSFLGKNQALKDLGKSYQT 165 

FLLLYVLSG+MGN T + TP VAAGASTSLFGLFAAIW+ + G NQ LK +GKSYQT 
Sbjct: 95 FLLLYVLSGIMGNVLTLFFTPHWHAGASTSLFGLFAAIVWGYFGHNQLLKSIGKSYQT 154 

Query: 166 LIvVNLLMTOjFMPNVSMAGHIGGWGGALLSIVFPTKMRVITVKKTKRMIALVSYGIILV 225 

LI++NL+MNLFMPNV + GH+GG +GGAL ++ PT + K ++ AL+SY + + 

Sbjct: 155 LIILNLVMNLFMPNVGIVGHLGGALGGAl^VFLPTLLDAELFTKKQKTSALLSYLTLAL 214 

Query: 226 GVLVLGFL 233 

++ L + 
Sbjct: 215 VLITLSLM 222 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 63/132 (47%) , Positives = 92/132 (68%) 

Query: 94 GSLRFLILYILSGLMGNIFTLFFTPHWAAGASTSLFGVFSAIAIAGYFGKNPYLKQVGK 153 

G FL+LY+LSG+MGN FT 4 TP VAAGASTSLFG+F+AI + + GKN LK +GK 
Sbjct: 102 GLTPFLLLYVLSGVMGNAFTFWLTPETVAAGASTSLFGLFAAnfVLSFLGKNQALKDLGK 161 

Query: 154 SYQVMI LLNLFFNI FTPGVSLAGHVGGLVGGVLVAIFLTKQNGSLLFKTWQSIIjALMIFI 213 

SYQ +I++NL N+F P VS+AGH+GG+VGG L++I + + K + +LAL+ + 
Sbjct: 162 SYQTLIVVNLLimjFMPNVSMAGHIGGVTGGALLSIVFPTKMRVITVKKTKRMLALVSYG 221 



Query: 214 IVS ISLIGLSLV 225 

1+ + ++ L + 
Sbjct: 222 I ILVGVLVLGFL 233 
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A further corresponding DNA sequence was identified in S.pyogenes <SEQ ID 9083> which encodes the 
amino acid sequence <SEQ ID 9084>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

:leavable N-term signal seq 

12 - 28 ( 7 - 30) 



Final Results 

bacterial membrane --- Certainty=0 .4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 



Query: 1 MTQLLKRYPXXXXXXXXXXXXXXAMQVWGHIATGAQAIYQVGGMFGLLVKAMPDQLWRL SO 

M + K YP MQ+ YG A 4Q I+Q GG+ G +KA P LWRL 

Sbjct: 3 MKKFAKEYPTTVLLVSLTTLVFLLMQLTYGSQAESSQVI FQFGGI QGDYLKAYPTNLWRL S2 

Query: 61 VTPXXXXXXXXXXXVNGLTLYFVGQIVEDLWGSRLF 96 

++P +NGL LYFVGQ+ E +WGS F 

Sbjct: 63 I S PI FVH IGWEHFLLNGLALYFVGQMGES IWGSLRF 98 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 271 

A DNA sequence (GBSx0296) was identified in S.agalactiae <SEQ ID 867> which encodes the amino acid 
sequence <SEQ ID 868>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2055 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MEKKLLRKEVLITLKSQPQAYKSEVDCKLLEAFIKTKAYQNSCVIATYLSFDYEYNTQLL 60 

M KK R +V+ LK Q +A K D +LLE 1+ +AYQ + VTATYL+F +E++T LL 
Sbjct: 1 MMKroYRTQVIEDLKKQDKAKKVLPJDEQLLEELIQLEAYQKAHVIATYLAFPFEFDTSLL 60 

Query: 61 IKQALCDGKRVLVPKTYPKGKMIFVDYQKDNLRTTPFGLLEPVNDRAVEKASIDLIHVPG 120 

I+QA D K ++VPKTYP+ KMIFV Y + +L+ T FGL EP ++ A+EK++IDLIHVPG 
Sbjct: 61 IEQAQFJ3NKSIWPKTYPQRKMIFVVYDEADLQITKFGLKEPRSEEALEKSAIDLIHVPG 120 

Query: 121 LIFNNKGFRIGYGAGYFDRYLSDFEGDTISTIYRCQRQDFVEEKHDVAVKEVL 173 

L FNN+G+RIG+GAGY+D+YL+DF+GDT+STIY Q+ F D+ VKEVL 

Sbjct: 121 I^FNNEGYRIGFGAGYYDQYIjADFQGDTVSTIYSFQQFTFEPSFFDIPVKEVL 173 

A related GBS nucleic acid sequence <SEQ ID 10925> which encodes amino acid sequence <SEQ ID 
10926> was also identified. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 272 

A DNA sequence (GBSx0297) was identified in S.agalactiae <SEQ ID 869> which encodes the amino acid 
sequence <SEQ ID 870>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -1.44 
INTEGRAL Likelihood = -0.22 



Final Results 

bacterial membrane --- Certainty=0. 1574 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9305> which encodes amino acid sequence <SEQ ID 9306> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD33517 GB:AF132127 glucose-6-phosphate isomeraBe 
[Streptococcus mutans] 
Identities = 344/401 (85%) , Positives = 374/401 (92%) 



Query: 

++LP+NYDKEEF+RI+KAAEKIKSDSEVLWIGIGGSYLGA+AAIDFLN+ F NL+ EE 
Sbjct: 49 IjNLEQNTOKEEFARIKKAAEKIKSDSEVLWIGIGGSVI/iARAAIDFIiNSSFVNLENKEE 108 

Query: 61 RKAEQILyAGNSISSTYLADLVEWQDKEFSVWISKSGTTTEPAIAFRVFlffiLLVKICYG 120 

RKAPQILYAGNSISS YLADLV+YV DK+FSVNvISKSGTTTEPAIAFRVFK+LLVHCTG 
Sbjct: 109 RKAPQILYAGNSISSOTLADLVDyVAD:<DFSVIWIS:<SGTTTEPAIAFRVFKDLLVKKYG 168 

Query: 121 QEEMKHIYATTDKVKGAVKv^MANNWETF\VPDNVGGRFSVLTAVGLLPIAASC3ADIT 180 

QEEAN+RIYATTDWKGAVKVEADAN WETFWPD+VGGRF+VLTAVGLLPIARSGAD+ 
Sbjct: 169 QEFJ^QRIYATTDRVKGAVKVEADANGWETFWPDSVGGRFTVLTAVGLLPIAASGADLD 228 

Query: 181 ALMEGANAARKDLSSDKISENIAYQyAAVRN\'LYRKGyiTEILANYEPSLQYFGEWWKQL 240 

LM GA AAR+D SS ++SEN AYQYAA+RN+LYRKGY+TE+LANYEPSLQYF EWWKQL 
Sbjct: 229 QLMAGAEAARQDYSSAELSENEAYQYAAIRNILYRKGYVTEV1ANYEPSLQYFSEVWKQL 288 

Query: 241 AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEGYRNLFETVVRVEKPRKNVTIPELTEDL 300 

AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEG RNLFETV+RVEK RKN+ +PE EDL 
Sbjct: 289 AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEGNFJILFETVIRVEKARKNILVPEAAEDL 348 

Query: 301 DGLGYLQGKDVDFVraOCATDGVLIAHTDGGVPimFVTLPTQDAYTLGYTIYFFELAIGLS 360 

DGL YLQGKDVDFVNKKATDGVLLAHTDGGVPN F+T+P QD +TLGY IYFFELAIGLS 
Sbjct: 349 DGIAYLQGKDTOFVmKATDGvLLMTDGGVPOTFLTIPEQDEFTLGWIYFFEIA.IGLS 408 

Query: 361 GYLNSVNPFDQPGVEAYKRNMFALLGKEGFEELSAELNARL 401 

GYLN VNPFDQPGVEAYK+NMFALLGKPGFEEL ASLNARL 
Sbjct: 409 GYLNGVNPFDQPGVEAYKKNMFALLGKPGFEELGAELNARL 449 

A related DNA sequence was identified in S.pyogenes <SEQ ID 871> which encodes the amino acid 
sequence <SEQ ID 872>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 209 - 225 ( 209 - 225) 
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INTEGRAL Likelihood = -0.22 

Final Results 

bacterial membrane --- Certainty=0. 1574 (Affirmative) < suco 

5 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=D . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD33517 GB:AF132127 glucose-6 -phosphate isomerase 
10 [Streptococcus mutatis] 

Identities = 369/449 (82%) , Positives = 408/449 (90%) 

Query: 1 MSHITFDYSIOTLESFAGQHEIDFLQGQVTEADKLLREGTGPGSDFLGWLDLPENYDKDEF SO 
M+HI FDYSKVL F HE+D++Q QVT AD+ LR+GTGPG++ GWL+LP+NYDK+EF 
15 Sbjct: 1 MTHIKFDYSKVLGKFIASHELDYIQMQVTAADEAI^KGTGPGAEOTGWLNLPQNYDKEEF SO 

Query: 51 ARILTAAEKIKADSEVLWIGIGGSYLGAICAAIDFLNHHFANLQTAKERKAPQILYAGNS 120 

ARI AAEKIK+DSEVLWIGIGGSYLGA4AAIDFLN F NL+ +ERKAPQILYAGNS 
Sbjct: 61 ARIKKAAEKIKSDSEVLWIGIGGSYLGARAAIDFLNSSFVNLENKEERKAPQILYAGNS 120 

20 

Query: 121 ISSTYIiADLVEYVQDKEFSVNVISKSGTTTEPAIAFRVFKELLVKKYGQEEANKRIYATT 180 

ISS YLADLV+YV DK+FSVNVISKSGTTTEPAIAFRVFK+LLVKKYGQEEAN+RIYATT 
Sbjct: 121 ISSNYLADLVDYVADKDFSVNVISKSGTTTEPAIAFRVFKDLLVKKYGQEEANQRIYATT 180 

25 Query: 181 DKVKGAVTOTEADANISIWETFWPDIWGGRFSVLTAV^^ 240 

D+VKGAVKVEADAN WETFWPD+VGGRF+VLTAVGLLPIAASGAD+ LM GA AAR+D 
Sbjct: 181 DRWGAVKVEADANGWETFWPDSVGGRFTVLTAVGLLPIAASGADLDQLMAGAEAARQD 240 

Query : 241 LSSDKISENIAYQYAAVRNVLYRKGY1TEILANYEPSLQYFGEWWKQLAGESEGKDQKGI 300 
30 SS ++SEN AYQYAA+RN+LYRKGY+TE+LANYEPSLQYF 3WWKQLAGESEGKDQKGI 

Sbjct: 241 YSSAELSENEAYQYAAIRNILYRKGYVTEVLANYEPSLQYFSEWWKQliAGESEGKDQKGI 300 

Query: 301 YPTSANFSTDLHSLGQFIQEGYRNLFETVIRVDNPRKNVIIPEIiAEDLDGLGYLQGKDVD 360 
YPTSANFSTDLHSLGQFIQEG RNLFETVIRV+ RKN+++PE AEDLDGL YLQGKDVD 
35 Sbjct: 301 YPTSANFSTDLHSLGQFIQEGNRNLFETVIRTOKARKNILVPEAAEDLDGLAYLQGKDVD 360 

Query: 361 FVNK^TDGVLIjAHTDGGVPNMFVTLPAQDEFTLGYTIYFFEI^AIAVSGYMNAVNPFDQP 420 

FWKKATDGVLLAHTDGGVPN F+T+P QDEFTLGY IYFFELAI +SGY+N VNPFDQP 
Sbjct: 361 FWKKATDGVLLAHTDGGVPNTFLTIPEQDEFTLGYV-IYFFELAIGLSGYLNGVNPFDQP 420 

40 

Query: 421 GVEAYKRNMFALLGKPGFEALSAELNARL 449 

GVEAYK+NMFALLGKPGFE L AELNARL 
Sbjct: 421 GVEAYKKNMFALLGKPGFEELGAELNARL 449 

45 The protein has homology with the following sequences in the databases: 

>GP:CAB907E5 GB:AJ400707 hypothetical protein [Streptococcus 

Identities = 58/91 (53%) , Positives = 69/91 (75%) 

50 Query: 6 KRYPITIFIiGLTGLIFIAMQVYYGHIATGAQAIYQVGGMFGLLVKAMPDQLWRLVTPIF 55 

K P+T F L +T L+FI WQV YG A Q ++Q GGMFGL+VK+MP QLWRLVTPIF 
Sbjct: 5 KEKPTOFFFLSOTILLFIVMQVFYC-SWAKSPQWFQFGGMFGLWKSMPSQLWRLVTPIF 54 

Query: 66 1HIGFGHFFVNGLTLYFVGQIVEDLWGSRLF 96 
55 IHIG+ HF +N LTLYFVGQ+ E +WGSR F 

Sbjct: 65 IHIGWEHFLINSLTLYFVGQLAESIWGSRFF 95 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 380/401 (94%) , Positives = 392/401 (96%) 

60 

Query: 1 MDLPENYDKEEFSRIQKAAEKIKSDSEVLWIGIGGSYLG1AKAAIDFLNNHFANLQTAEE 60 

+DLPENYDK+EF+RI AAEKIK+DSEVLWIGIGGSYLGAKAAIDFLN+HFANLQTA+E 
Sbjct: 49 LDLPENYDKDEFARILTAAEKIKADSEVLWIGIGGSYLGAKAAIDFLNHHFANLQTAKE 108 
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Query: 61 RKAPQILYAGNSISSTYLRDJjVEWQDKEPSVNVISKSGTTTEPAIAFRVFKELLVKKYG 120 

RKAPQILYAGNSISSTyiADLVEWQDKEFSVMTISKSGTTTEPAIAFRVFKELLVKICyG 
Sbjct: 109 RKAPQILYAGNSISSTYLRDLVEYVQDKEFSVNVISKSGTTTEPAIAFRVFKELLVKKYG 168 

Query: 121 QEEANICRIYATTDKVKGAVKVEADATOIWETFWPDWGGRFSVLTAVGLLPIAftSGADlT 180 

QEEANEOlIYATTDKVKGAVKVEAnANtJWETFVVPDNVGGRFSVLTAVGjjLPIARSGADIT 
Sbjct: 169 QEF^KRIYATTDKVKGAVKVEADANMffiTFVVPIOTGGRFSVLTAVGLLPIAASGADIT 228 

Query: 181 AMEGANAARIQDLS8DKISENIAYQYAAVRNVLYRKGYITEIMWYEPSLQYFGEWWKQL 240 

ALMEGANAiWKDLSSDKISENIAYQYARVRNVLYRKGYITETlANYEPSLQYFGEWWKQL 
Sbjct: 229 ALMEGANAM^KDLSSDKISENIAYQYAAVRNVLYRKGYITEILAlSTYEPSIiQYFGEWWKQL 288 

Query: 241 AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEGYRNLFErWRVEKPRKMTIPELTEDL 300 

AGESEGKDQKGIYPTSANFSTDLHSLGQFICEGYRNLFETV+RV4 PRKNV IPEL EDL 
Sbjct: 289 AGESEGKDQKGIYPTSANFSTDLHSLGQFIQEGYRNLFETVIRVDNPRKNVIIPELAEDL 348 

Query: 301 DGLGYLQGKDVDFVWKKATDGVLIAHTDGGVPIIMFVTLPTQDAYTLGYTIYFFEIAIGLS 360 

DGLGYLQGKDVDFVHKKATDGVLtAHTDGGVPHMFVTLP QD +TLGYTIYFFELAI +S 
Sbjct: 349 DGLGYr^KDVDFWOATDGVLIAHTEGGVPIJMFVTLPAQDEFTLGYTIYFFEIAIAVS 408 

Query: 361 GYUStSVNPFDQPGVEAYKRNMFALLGKPGFEELSAEBNARL 401 

GY+N+VNPFDQPGVEAYKRNMFALLGKPGFE LSAELNARL 
Sbjct: 409 GYMNAVNPFDQPGVFAYKRNMFALLGKPGFEALSAELNARL 449 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 273 

A DNA sequence (GBSx0298) was identified in S.agalactiae <SEQ ID 873> which encodes the amino acid 
sequence <SEQ ID 874>. Analysis of this protein sequence reveals the following: 
Possible site: 38 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 654 - 670 ( 653 - 671) 
INTEGRAL Likelihood = -1.65 



Final Results 

bacterial membrane --- Certainty=0. 2062 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9463> which encodes amino acid sequence <SEQ ID 9464> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA81906 GB:U04863 alcohol dehydrogenase 2 [Entamoeba 
histolytica] 

Identities = 536/864 (62%), Positives = 663/864 (76%), Gaps = 3/864 (0%) 

Query: 20 ETTDVALAIDTLVQNGLKALDEMR--QLNQEQVDYIVAKASVAALDAHGELALHAVEETG 77 

+T V 1+ LV+ AL E + QE++DYIV KASVAALD H LA AVEETG 
Sbjct: E QTMTVDEHINQLTOKAQVALKEYLKPEYTQEKIOTIVKKASVAALDQHCALAAAAVEETG 64 

Query: 78 RGVFEDKATKI&FACSHVVMIMP^TKWGVIEEDDVTGLTLIAEPVGvVCGITPTTNPTS 137 

RG+FEDKATKN+-FACEHV + MRH KTVG+I D + G+T IAEPVGWCG+TP TNPTS 
Sbjct: 65 RGIFEDKATKNIFACEHVTHEMRHAKWGIINVDPLYGITEIAEPVGWCGVTPVTNPTS 124 

Query: 138 TAIFKSLISLKT'RNPIIFAFHPSAQJESSAHAARIVREfiAIAAGAPENCVQWIEQPSIDAT 197 

TAIFKSLIS+KTRNPI+F+FHPSA + S AA+IVRDAAIAAGAPENC+QWIE 
Sbjct: 125 TAI FKSL1 S IKTRNPI VFSFHPSALKCSIMAAKI VRDAAIAAGAPENCI QWIEFGGIEAS 184 

Query: 198 NALMOTIX3IATIIATGGNAMVKftAYSCGKPALGyGAGWPAYVEKSANIRQA&HDIVMSK 257 
N LMNH G+ATILATGGNAMVKAAYS GKPALGVGAGNVP Y+EK+ NI+QAA+D+VMSK 
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Sbjct: 


185 


NKLMJraPGVATIIATGGNAMVKAAYSSGKPALGVGAGKV?TYIEKTCNIKQ7AflNDVVMSK 


244 


5 


Sbjct: 


258 
245 


SFDNGNWCaSEQAVIIDKEIYKEFVEEFKSYHTYFVNKKEKMiIjEEFCFGAKANSKNCAG 
SFDNGM+CASEQA IIDKEIY + VEE K+ YF+N++EKA LE+F FG A S + 
SFDNGMICASEQAAIIDKEIYDQVVEEMKTLGAYFI1KEEKAKLEKFMFGVHAYSADVNN 


317 


10 


Sbjct: 


318 
305 


AKMPNIVGKSAWIAEQAGFTVPEGTraLAAECTEVSEKEPLTREKLSPVIAVLKAEST 
A+LNP G S W AEQ G VPE NI+ A C EV EPLTREKLSPV+A+LKAE+T 
ARLNPKCPGMS PQWFAEQVGI KVPEDCNI I CAVCKEVGPNEPLTREKLSPVIiAI LKAENT 


364 


Sbjct: 


378 
365 


EDGVEKARQMVEFNGLGHSAAIHTKDADLAREFGTRIRAIRVIWNSPSTFGGIGDVYWAF 
+DG++KA MVEFHG GHSAAIH+ D + ++ ++A R++ N+PS+ GGIG +YN 
QDGIDKAEAMVEFNGRGHSAAIHSNDKAVVEKYALTMKACRILHI^PSSQGGIGSIYNYI 


437 
424 


15 


Sbjct: 


438 
425 


LPSLTLGCGSYGRNSVGDWSAINLIiNIKKVGRRRISINMQWFKVPSKTYFERDSIQYLQKC 

PS TLGCGSYG NSV NV+ NLLNIK++ RRNN+QWF+VP K +FE Sl+YL + 
WPSFTLGCGSYGGNSVSANVTYHNL]^IKRIJU3PJRH(^WFRVPPKIFFEPHSIRYLAEL 


497 
484 


20 


Sbjct: 


498 
485 


RDVERVMIVTDHAMVELGFLDRI IEQLDLRRNKWYQI FAEVEPDPDITTVMKGTDLMRT 
+++ ++ IV+D M +LG++DR+++ L R N+V +IF +VEPDP I TV KG +M T 
KELSKIFIVSDRMMYKLGYVDRVMDVLKRRSNEVE ISI FIDVEPDPS IQTVQKGLAVMMT 


544 


25 


Query: 
Sbjct: 


558 
545 


FKPDTIIALGGGSPMDAA.KVMWLFYEQPEVDFHDLVQKFMDIRKRAFKFPELGKKTKFVA 
F PD IIA+GGGS MDAAK+MWL YE PE DF + QKF+D+RKRAFKFP +GKK + + 
FGPDNIIAIGGGSAMDAAKIMWLLYEHPEADFFAMKQKFIDLRKRAFKFPTMGKKARLIC 


617 
604 


30 


Sbjct: 


518 
S05 


IPTTSGTGSEVTPFAVISDKANNRKYPIADYSLTPTVAIVDPALVMTVPGFIAADTGMDV 
I PTTSGTGSEVTPFAVI SD +KYP+ADYSLTP+VAIVDP M++P ADTG+DV 
IPTTSGTGSEOTPFAVISDHETGKKYPLADYSL.TPSVAIVDPMFTMSLPKRA1ADTGLDV 


S77 
664 


Sbjct: 


678 
665 


LTHATEaWSQMANDYTDGLALQAIKIVFDYLEP^VKDADFEAREKMHNAST^GMAFAN 
L HATEAYVS MAN+YTDGLA +A+K+VF+ L +S + D EAREKMHNA+T+AGMAFA+ 
LVHATETiWSVMAIffiYTDGIAREAVKLVFENLLKSY-NGDLEAREKMHNAATIAGMAFAS 


737 
723 


35 


Sbjct:, 


738 


aflgishsmahkigaqfhtvhgrtnaillpw;ryngtrpaktatwprynyyradekyqd 
aflg+ hsmahk+ga fh hgr a+llp+viryng +p k a wpkyn+y+ad++y + 
aflgmdhs^w^kvgaafhlphgrcvavllphviryngqkprkiamwpkynfykadqryme 




40 


Query: 
Sbjct: 


798 
784 


IAKLLGLPAATPEEAVESYAKAVYDLGTRLGIKMNFRDQGIDEKEWKEKSRELAFLAYED 
+A+++GL TP E VE++AKA +L F+ IDE W K E+A LA+ED 
IAQMVGLKCNTPAEGVEAFAKACEELMKATETITGFKKANIDE^WSKVPEMALLAFED 


857 
843 


45 


Sb j ct : 


858 
844 


QCSPANPRLPMVDHMQEI IEDAYY 881 
QCSPANPR+PMV M44-I++ AYY 
QCSPANPRVPMVKDMEKILKAAYY 867 





A related DNA sequence was identified in S.pyogenes <SEQ ID 87 5> which encodes the amino acid 
sequence <SEQ ID 876>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.66 Transmembrane 643 - 659 ( 642 - 660) 
INTEGRAL Likelihood = -1.81 Transmembrane 102 - 118 ( 102 - 118) 

55 Final Results 

bacterial membrane Certainty=0. 2466 (Affirmative) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

60 The protein has homology with the following sequences in the databases: 

>GP:AAA81906 GB:U04863 alcohol dehydrogenase 2 [Entamoeba 
histolytica] 

Identities = 535/870 (61%), Positives = 669/870 (76%), Gaps = 3/870 (0%) 



65 Query: 6 NTVETTSVSVTIDALVQKGLAALEEMRKLD--QEQVDYIVAKASVAALDAHGELAKHAYE 63 
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Query: 64 ETGRGVEBDKATKHLFACEHVVNKMRHQKTVQIIEEDDVTGLTLIAEPVGVICGITPTTN 123 

ETGRG+ FEDKATK+ + FACEHV + MRH KTVGII D + G+T IAEPVGV+CG+TP TN 

Sbjct: 62 ETGRGIFEDKATKNIFACEHVTHEMRHAK3VGIINVDPLYGITEIAEPVGWCGVTPVTN 121 

Query: 124 PTSTAIFKSLISLKTRNPIIFAFHPSAQESS2\H2yVRIVRDAMAAGAPENCVQWVETPSIi 183 

PTSTAIFKSLIS+KTRNPI+F+FHPSA + S AA+IVRDAAIAAGAPENC+QW+E + 

Sbjct: 122 PTSTAIFKSLISIKTRNPIVFSFHPSALKCSIMA^KIVRDAAIAAGAPENCIQWIEFGGI 181 



Query: 244 MSKSFDNGMVCASEQAVIIDKEITODF , \7AEFKSYHTYEVNKKEK2\LLEEFCFGAKANSKN 303 

MSKSFDNGM+CASEQA IIDKEIYD V E K+ YF+N++EKA LE+F FG A S + 
Sbjct: 242 MSKSFDNGMICASEQAAIIDKEIYDQWEEMKTLGAYFINEEEKAKLEKFMFGVNAYSAD 301 

Query: 304 CAGAKLNPNIVGKPATWIAEQAGFTVPEGTNIIAAECKEVSENEPLTREKLSPVIAVLKS 363 

A+LNP G W AEQ G VPE NI+ A CKEV NEPLTREKLS PV+A+LK+ 
Sbjct: 302 VNNARLNPKCPGMS PQWFAEQVG I KVPEDCNI I CAVCKEVGPNEPLTREKLS PVLAILKA 361 

E+ +DG++KA MVEFNG GHSAAIH+ D ^4 +++ ++A R++ N+PS+ GGIG +Y 
Sbjct: 362 EOTQDGIDKAEAMVEFNGRGHSAAIHSNDKAVVEKYALTMKACRILHNTPSSQGGIGSIY 421 

Query: 424 NAFLPSLTLGCGSYGRNAVGDNVSAINLLNIKKVGRRRNNMQWFKVPSKTYFERDSIQYL 483 

N PS TLGCGSYG N+V NV+ NLLNIK++ RRNN+QWF+VP K +FE SI+YL 
Sbjct: 422 NYIWPS FTLGCGS YGGNS VSANVTYHNLIiNIKRIiADRRNNLQWFRVPPKI FFEPHSI RYL 481 

Query: 484 QKCRDVERVMIVTDHAIWELGFLDRIIEQLDLRRNKVVYQIFAEVEPDPDITTVMKGTEL 543 

+ +++ ++ IV+D M +LG++DR+++ L R N+V +IF +VEPDP I TV KG + 
Sbjct: 482 AELKELSKIFIVSDRtMYKLGYVDRVIffiVLKRRSKEVEIEIFIDVEPDPSIQTVQKGIiAV 541 

Query: 544 MRTFKPDTIIALGGGSPMDAAKVMWLFYBQPEVDFHDIjVQKFMDIRKRAFKFPELGKKTK 603 

M TF PD IIA+GGGS MDAAK+MWL YE PE DF + QKF+D+RKRAFKFP +GKK + 
Sbjct: 542 MTrFGPDNIIAIGGGSAMDAF^KIMWLLYEHPEADFFAMKQKFIDLRKRAFKFPTMGKKRR 601 

Query: 604 FVAIPTTSGTGSEVTPFAVISDKANNRKYPIADYSLTPTVAIVDPALVLTVPGFIAADTG 663 

+ IPTTSGTGSEVTPFAVISD +KYP+ADYSLTP+VAIVDP +++P ADTG 
Sbjct: 602 LICIPTTSGTGSEVTPFAVISDHETGKKYPLADYSLTPSVAIVDPMFTMSLPKRAIADTG 661 

Query: 664 ^VLTHATFAYVSQMAITOFTDGIALQAIKIVFDNLEKSVKTADFFAREKMHNASTMAGMA 723 

+DVL HATEAYVS MAN++TDGLA +A+K+VF+NL KS D EAREKMHNA+T+AGMA 
Sbjct: 662 LDVLVHATEAYVSVMMTOYTDGIAREAViaVFENLLKSY-NGDLEAREKMHNAATlAGMA 720 

Query: 724 FANAFLGISHSM7AHKIGAQFHTVHGRTNAILLPYVIRYNGTRPAKTATWPKYNYYRADEK 783 

FA+AFLG+ HSMAHK+GA FH HGR A+LLP+VIRYNG +P K A WPKYN+Y+-AD++ 
Sbjct: 721 FASAFLGMDHSMAHKVGAAFHLPHGRCVAVIjLPHVIRYHGQKPRKLJ^flPKYNFYKADQR 780 

Query: 784 YQDIAPCLLGLPASTPEEAVESYAKAVYDLGCRVGIQMNFKRQGIDENEWKEHSRELAYLA 843 

Y ++A+++GL +TP E VE++AKA +L FK IDE W E+A LA 

Sbjct: 781 YMEIAQIWGLKCNTPAEGvEAFAKACEEIMKAT^ 840 

Query: 844 YEDQCSPANPRLPMVDHMQEIIEDAYYGYA 873 

+EDQCSPANPR+PMV M++I++ AYY A 
Sbjct: 841 FEDQCSPANPRVPMVKDMEKILKRAYYPIA 870 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 827/880 (93%), Positives = 852/880 (95%) 

Query: 12 MTEKTKAVETTDVAI^IDTLVQNGLKftLDEI^QI^QEQVDYIVAKASVAALDAHGELALH 71 

MTE VETT V++ ID LVQ GL AL+EMR+L+QEQVDYIVAKS£VAALDAHGELA H 
Sbjct: 1 MTEGHWTVETTSVSVTIDJ^VQKGIJ^EEMRKIJDQEQVDYIVAKASVAA^ 60 

Query: 72 AVEETGRGVFEDKATKNLFACEHVVMSMRH^^ 131 



WO 02/34771 



PCT/GB01/04789 



Sbjct: 


61 


Query: 


132 


Sbjct: 


121 


Query: 


192 


Sbjct: 


181 


Query: 


252 


Sbjct: 


241 


Query: 


312 


Sbjct: 


301 


Query: 


372 


Sbjct: 


361 


Query: 


432 


Sbjct: 


421 


Query: 


492 


Sbjct: 


481 


Query: 


552 


Sbjct: 


541 


Query: 


S12 


Sbjct: 


601 


Query: 


672 


Sbjct: 


661 


Query: 


732 


Sbjct: 


721 


Query: 


792 


Sbjct: 


731 


Query: 


852 


Sbjct: 


841 



-346- 

A EETGRGVFEDKATK+LFACEHVVNNMRH KTVG+IEEDDVTGLTLIAEPVGV4CGITP 



PS+ +ATNALMNHDG I AT I LATGGNAMVKAAYS CGKPALflVGAGNV PAYVEKSANIRQAAH 



DIVMSKSFDNGMVCASEQAVI IDKEIY +FV EFKSYHTYFVNKKEKALLEEFCFGAKAN 



SKNCAGAKLNPNIVGK A WIAEQAGFTVPEGTNILAAEC EVSE EPLTREKLS PVIAV 



LK+ES EDGVEKARQMVEFNGLGHSAAIHT DA+LA+EFGTRIRAIRVIWNSPSTFGGIG 



QYLQKCRDVERVMIVTDHAMVELGFLDRIIEQLDLRRNKiA^YQIFAEVEPDPDITTVMKG 



T+LMRTFKPDTIIALGGGSPMDAAKVMWLFYEQPEVDFHDLVQKFMDIRKRAFKFPELGK 



KTKFVAIPTTSGTGSEVTPFAVISDK/ANNRKYPI7ADYSLTPTVAIVDPALVH-TVPGFIAA 



DTGMDVLTHATEAYVSQMAND+TDGLALQAIKIVFD LE+SVK ADFEAREKMHNASTMA 



GMAFANAFLGISHSMAHKIGAQFHTVHGRTNAILLPYVIRYNGTRPAKTATWPKYNYYRA 



DEKYQDIAKLLGLPA+TPEEAVESYAKAVYDLG R+GI+MNF+ QGIDE EWKE SRELA 



+LAYEDQCSPANPRLPMVDHMQEIIEDAYYGY ERPGRRK 
YLAYEDQCSPANPRLPMVDHMQEI IEDAYYGYAERPGRRK 880 

A related GBS gene <SEQ ID 8533> and protein <SEQ ID 8534> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -4.68 
GvH: Signal Score (-7.5): -2.48 

Possible site: 21 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -2.66 threshold: 0.0 

INTEGRAL Likelihood = -2.66 Transmembrane 100 - 116 ( 99 - 117) 
PERIPHERAL Likelihood = 3.61 173 
modified ALOM score: 1.03 
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*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

SEQ ID 8534 (GBS432) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane 5; MW 66kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 7; MW 41kDa). 

GBS432-GST was purified as shown in Figure 223, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 274 

A DNA sequence (GBSx0299) was identified in S.agalactiae <SEQ ID 877> which encodes the amino acid 
sequence <SEQ ID 878>. Analysis of this protein sequence reveals the following: 

) N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3444 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 880. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 275 

A DNA sequence (GBSx0300) was identified in S.agalactiae <SEQ ID 881> which encodes the amino acid 
sequence <SEQ ID 882>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.39 Transmembrane 74 - 90 ( 69 - 94) 
INTEGRAL Likelihood = -5.31 Transmembrane 168 - 184 ( 163 - 186) 
INTEGRAL Likelihood = -4.83 Transmembrane 34 - 50 ( 29 - 52) 
INTEGRAL Likelihood = -0.75 Transmembrane 202 - 218 ( 202 - 219) 



40 Final Results 

bacterial membrane --- Certainty=0. 4354 (Affirmative) < succ: 

bacterial outside Certainty=0 . 000G (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA17305 GB:AL021926 hypothetical protein RvOUl [Mycobacterium 
tuberculosis] 

Identities = 70/218 (32%) , Positives = 104/218 (47%) , Gaps = 12/218 (5%) 
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Sbjct: 39 LRAIAVALVLASHGGIPGMGGGFIGVDAFFVLSGFLITSLLLDELGRTGRIDLSGFWIRR 98 

Query- 69 FYRIFPPLVLMVLVTIPFVFLVKSDFRASIGSQIMTALGFTSNFYEILTGGNYESQFI-P 127 

R+ P LVLMVL L + S + A +T+N+ + +Y +Q P 

Sbjct: 99 ARRLLPALVL^TOSAARALFPDa^TGLRSDAIAaFLVffANWRFVAQNTDYFTQGAPP 158 

Query: 128 HLFVHTWSLSIEWYVLWGL TVWLLSKRSKDQKQLRGTLFLISMGIFGVSFLTMF 183 

HTOSL +E +YV+W L LLH- R++ ++ R T+ + F ++ L 

Sbjct: 159 SPLQHTWSLGVEEQYYVVWPLLLIGATIilAARAR-RRCRRATVGGVRFAAFLIASLGTM 217 



Query: 184 VRAFFVDNFST- - 



--IYFSTLSHIFPFFLGAMVATI 215 
IYF T + +G+ A + 

: 218 ASATAAVAFTSAATRDRIYFGTDTRAQALLIGSAAAAL 255 



15 A related DNA sequence was identified in S.pyogenes <SEQ ID 879> which 
sequence <SEQ ID 880>. Analysis of this protein sequence reveals the following: 



encodes the amino acid 



Possible site: 


46 














>» Seems to have an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood =- 


10.83 


Transmembrane 


325 - 


341 


313 


346) 


INTEGRAL 


Likelihood = 


-9.29 


Transmembrane 


237 - 


253 


234 


258) 


INTEGRAL 


Likelihood = 


-7.91 


Transmembrane 


166 


182 


162 


188) 


INTEGRAL 


Likelihood = 


-6.10 


Transmembrane 


72 


88 


66 


92) 


INTEGRAL 


Likelihood = 


-4.09 


Transmembrane 


264 


280 


260 


281 


INTEGRAL 


Likelihood = 


-2.87 


Transmembrane 


371 


387 


370 


390 


INTEGRAL 


Likelihood = 


-2.66 


Transmembrane 


34 


50 


32 


- 50 


INTEGRAL 


Likelihood = 


-1.91 


Transmembrane 


3 


19 


3 


- 19 


INTEGRAL 


Likelihood = 


-0.85 


Transmembrane 


136 


152 


136 


- 154 



Final Results 

bacterial membrane — Certainty=0. 5331 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow: 

Identities = 167/226 (73%) , Positives = 195/226 (85%) 
Query: 1 MRIKWFSLVRITGLLLVLLYHFFKNSFPGGFVGVDIFFTFSGFLITALLIDEFSKTKKID 60 

MRIKWFS VR+TGLLLVLLYHFFKN FPGGF+GVDIFFTFSG+LITALLIDE++K + ID 
Sbjct: 1 MRIKWFSFVRVTGLLLVLLYHFFKNVFPGGFIGVDIFFTFSGYLITALLIDEYTKKESID 60 

Query: 61 FVSFCRRRFYRIFPPLVIMVLVTIPFVFLVKSDFRASIGSQIMTALGFTSNFYEILTGGN 120 

+ F +RRFYRI PPLVLM+L+TIPF FL+K DF A+IGSQI LGFT+N YEILTG + 
Sbjct: 61 IIGFLKRRFYRIVPPLVLMILLTIPFTFLIKKDFIANIGSQITAVLGFTTNIYEILTGSS 120 

Query: 121 YESQFIPHLFVHTWSLSIEVHFYVLWGLTVWLLSKRSKDQKQLRGTLFLISMGIFGVSFL 180 

YESQFIPHLFVHTWSL+IEVHFY+ WG+ VWLL++R + QKQLRG LFLIS+GIF +SFL 
Sbjct: 121 YESQFIPHLF\raTOSLAIEVHFYLFWGVFVWLIARRKETQKQLRGLLFLISLGIFAISFL 180 

Query: 181 TMFVRAFFVDNFSTIYFSTLSHIFPFFLGAMVATISGIREITGRFK 226 

+MF+R+F NFS IYFS+LSH FPFFLGAM ATI+GI E T RF+ 
Sbjct: 181 SMFIRSFMTSNFSLIYFSSLSHSFPFFLGAMFATITGINETTVRFQ 226 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 276 

A DNA sequence (GBSx0302) was identified in S.agalactiae <SEQ ID 883> which encodes the amino acid 
sequence <SEQ ID 884>. Analysis of this protein sequence reveals the following: 



i cleavable H-te 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < succ 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

!GB:AE004818 hypothetical protein [Pseudomonas aerug 

!GB:AE004818 hypothetical protein [Pseudomonas aerug. 

!GB:AE004818 hypothetical protein [Pseudomonas aerug. 

!GB:AE004818 hypothetical protein [Pseudomonas aerug. 

!GB:AE004818 hypothetical protein [Pseudomonas aerug 



Query: 45 ICYVGSIVNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWSYTGDFKKGQPDGQ 104 

+Y G +V+ + G+G+L Y+NG +Y G F +G+ G GT+ G Y+G F G DGQ 
Sbjct: 39 RYRGELVDGRLEGQGRLDYDNGAWYAGRFEHGLLHGHGTWQGADGSRYSGGFARGLFDGQ 98 

Query: 105 GRLNAKNKKVYKGTFKQGIY 124 

GRL + VY+G F+QG++ 
Sbjct: 99 GRLAMADGSVYQGGFRQGLF 118 
Identities = 31/91 (34%) , Positives = 45/91 (50%) , Gaps = 2/91 (2%) 

Query: 34 QGVFS YDGGKI KYVGS I VNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWSYT 93 

QG YD G YG + + G G +GYGF G+F+G+G G Y 

Sbjct: 52 QGRLDYDNGAW-YAGRFEHGLLHGHGTWQGADGSRYSGGFAAGLFDGQGRIAMADGSVYQ 110 

Query: 94 GDFKKGQPDGQGRLNAKNKKVYKGTFKQGIY 124 

G F++G DG+G h + + Y+G F++G+Y 
Sbjct: 111 GGFRQGLFDGEGSLEQQGTR-YRGGFRKGLY 140 
Identities = 31/91 (34%) , Positives = 42/91 (46%) , Gaps = 1/91 (1%) 

Query: 32 SSQGVFSYDGGKIKYVGSIVNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHGWS 91 

S QG G +Y GS + G+G + G+ Y G F +G GKG + G 

Sbjct: 141 SGQGTLDGSDGS-RYQGSFRQGRLEGEGSFSDSQGNQYAGTFRDGQMGKGRWSGPDGDR 199 

Query: 92 YTGDFKKGQPDGQGRLNAKNKKVYKGTFKQG 122 

Y G FK Q GQGR + + V+ G F +G 
Sbjct: 200 YVGQFKDNQFHGQGRYESASGDVWIGRFSEG 230 
Identities = 31/91 (34%) , Positives = 45/91 (49%) , Gaps = 4/91 (4%) 

Query: 34 QGVFSYDGGK IKYVGSIVNHHMTGKGKLTYENGDYYKGDFVNGVFEGKGTFVSVHG 89 

QG+F +G +Y G +G+G L 4G Y+G F G EG+G+F G 

Sbjct: 115 QGLFDGEGSLEQQGTRYRGGFRKGLYSGQGTLDG3DGSRYQGSFRQGRLEGEGSFSDSQG 174 

Query: 90 WSYTGDFKKGQPDGQGRLNAKNKKVYKGTFK 120 

Y G F+ GQ +G+GR + + Y G FK 
Sbjct: 175 NQYAGTFRDGQLNGKGRWSGPDGDRYVGQFK 205 
Identities = 28/87 (32%) , Positives = 45/87 (51%) , Gaps = 1/87 (1%) 

+G FS G +Y G+ + + GKG+ + +GD Y G F + F G+G + S G + 
Sbjct: 166 EGSFSDSQGN-QYAGTFRDGQLNGKGRWSGPDGDRYVGQFKDNQFHGQGRYESASGDVWI 224 

Query: 94 GDFKKGQPDGQGRUIAKNKKVYKGTFK 120 

G F +G +G G L + Y+G F+ 
Sbjct: 225 GRFSEGALNGPGELLGADGSRYRGGFQ 251 
Identities = 28/89 (31%) , Positives = 43/89 (47%) , Gaps = 2/89 (2%) . 

Query: 34 QG VFSYDGGKI KYVGS I VNHHNTGKGKLTYENGDYYKGDFVNG'VFEGKGTFVSVHGWSYT 93 

QG + G + Y G G+G L + G Y+G F G++ G+GT G Y 

Sbjct: 98 QGRIAMADGSV- YQGGFRQGIiFDGEGSIiE - QQGTRYRGGFRKGLYSGQGTLDGSDGSRYQ 155 



Query: 94 GDFKKGQPDGQGRLNAKNKKVYKGTFKQG 122 

G F++G+ +G+G + Y GTF+ G 

Sbjct: 156 GSFRQGRLEGEGSFSDSQGNQYAGTFRDG 184 
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Identities = 25/80 (31%) , Positives = 37/80 (46%) 

Query: 45 KYVGSIVNHHMTGKGKLTYENGDYYKC-DFOTGVFEGKGTFVSVHGWSYTGDFKKGQPDGQ 104 

+YVG ++ G+G+ 4GD + G F G G G + G Y G F+ + GQ 
Sbjct: 199 RYVGQFKDNQFHGQGRYESASGDVWIGRFSEGMJSIGPGELLGADGSRYRGGFQFWRFHGQ 258 

Query: 105 GRLNAKNKKVYKGTFKQGIY 124 

G L + Y+G F G Y 
Sbjct: 259 GLLEQLDGTRYEGGFAAGAY 278 

A related DNA sequence was identified in S.pyogenes <SEQ ID 885> which encodes the amino acid 
sequence <SEQ ID 886>. Analysis of this protein sequence reveals the following: 
Possible site: 35 



Final Results 

bacterial membrane Certainty=0. 6265 (Affirmative) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < s\ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < si 

The protein has homology with the following sequences in the databases: 

>GP:BAA16606 GB:D90899 hypothetical protein [Synechocystis sp.] 
Identities = 37/89 (41%) , Positives = 49/89 (54%) , Gaps = 6/89 (6%) 

Query: 48 KGRMHYT- 
KG YT 

Sbjct: 141 

Query: 102 EFHKGQANGKGVLKAKNNKVYKGIFKQGI 130 

EF G+ +G+G N ++G FKQG+ 

Sbjct: 201 EFQSGEFSGQGTRIFANGNRFQGQFKQGIj 229 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/126 (53%) , Positives = 93/126 (72%) 

Query: 1 MKNFKITRTHLEILSLIIIWFGLSVFTLTTSSQGVFSYDGGKIKYVGSIVNHHMTGKGK 60 

+K + ITR LEI+S+I + I+V +SVF++ S++ +YD G++ Y G ++NH M G+GK 
Sbjct: 8 WKWSITF<AKLEIVSVIVILVCAISVFSVRISNKT3LTYDKGRMHYTGYVINHKMNGEGK 67 

Query: 61 LTYENGDYYKGDFVNGVFEGKGTFVSVHGKSYTGDFKKGQPDGQGRLNAKNKKVYKGTFK 120 

L Y NGD Y+G F +G+FEGKGTF 4- GW Y G+F KGQ +G+G L AKN KVYKG FK 
Sbjct: 68 LVYPNGDIYEGTFKDGLFEGKGTFTAKTGWLYNGEFHKGQflNGKGVLKAKNNKVYKGIFK 127 

Query: 121 QGIYQK 126 

QGI+QK 
Sbjct: 128 QGIFQK 133 

SEQ ID 884 (GBS139) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 19 (lane 3; MW 13kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 22 (lane 2; MW 38.2kDa), in Figure 24 
(lane 7; MW 38kDa) and in Figure 33 (lane 7; MW 38.2kDa). 

The GBS139-GST fusion product was purified (Figure 200, lane 2) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 287), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 277 

A DNA sequence (GBSx0303) was identified in S.agalactiae <SEQ ID 887> which encodes the amino acid 
5 seqtience <SEQ ID 888>. This protein is predicted to be holliday junction dna helicase ruvb (ruvB). 
Analysis of this protein sequence reveals the following: 

Possible site: 59 

i» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 4386 (Affirmative) < suoo 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB75331 GB.-Y15896 RuvB protein (Bacillus subtilis] 
Identities = 196/322 (60%) , Positives = 254/322 (78%) 

Query: 3 RFLDSDRMGDEELVERTLRPQYLREYIGODKVKDQIiKIFIEAAKLRDESLDHVLLFGPPG 62 
20 R + S+A E ++E++LRPQ L +YIGQ KVK+ L4-+FI+AAK+R E+LDHVLL.+GPPG 

Sbjct: 4 RLVSSFJiXJNHESVIEQSLRPQNLAQYIGQHKVKENLRVFIDAAKMRQETLDHVLliYGPPG 63 

Query: 63 LGKTTMAFVIANELGWI.KQTSGPA1EKSGDLVAILNDLEPGDVLFIDEIHRMPMAVEEV 122 
LGKTT+A ++ANE+GV L+ TSGPAIE+ GDL AIL LEPGDVLFIDEIHR+ +4EEV 
25 Sbjct: 64 LGKTTLaSIVANEMGVELRTTSGPAIERPGDLAailjTALEPGDVLFIDEIHRLHRSIEEV 123 

Query: 123 LYSAMEDFYIDIMIGAGETSRSVHLDIiPPFTLIGATTRAGMLSNPIjRARFGITGHMEYYE 182 

LY AMEDF +DI+IG G ++RSV LDLPPFTL+GATTR G+L+ PLR RFG+- +EYY 
Sbjct: 124 LYPAMEDFCLDIVIGKGPSARSVRLrjIaPPFTLVGATTRVGLLTAPLRDRFGVMSRLEYYT 183 

30 

Query: 183 ENDLTEI IERTADI FEMKITYEAASELARRSRGTPRIANRLLKRVRDYAQIMGDGIjIDDN 242 

+ +L +1+ RTAD+FE++I +A E+ARXSRGTPR+ANRLL+RVRD+AQ++GD I 4+ 
Sbjct: 184 QEELADIOTRTADVFEVEIDKPSALEIARRSRGTPRVA7<IRIiLRRVRDFAQVLGDSRITED 243 

35 Query: 243 ITDKaLTMLDVDHEGLDYVDQKILRTMIE^IYNGGPVGLGTLSVNIREERDTVEDMYEPYL 302 

It M LVD GLD++D K+L MIE +NGGPVGL T+S I EE T+ED+-YEPYL 
Sbjct: 244 ISQNALERLQVDRLGLDHIDHKLLMGMIEKFNGGPVGLDTISATIGEESHTIEDVYEPYL 303 

Query: 303 IQKGFIMRTRTGRVATVK&YEH 324 
40 +Q GFI RT GRt- T Y H 

Sbjct: 304 LQIGFIQRTPRGRIVTPAVYHH 325 

A related GBS nucleic acid sequence <SEQ ID 10943> which encodes amino acid sequence <SEQ ID 
10944> was also identified. 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 889> which encodes the amino acid 
sequence <SEQ ID 890>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 0686 (Affirmative) < suco 

bacterial membrane Certainty=0- 0000 (Not Clear) < suco 

bacterial outside Certainty=o. 0000 (Not Clear} < suco 



55 An alignment of the GAS and GBS proteins is shown below: 

Identities = 282/327 (86%) , Positives = 306/327 (93%) 
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Query: 1 MTRFLDSDAMGDEELVERTLRPQYLREYIGQDKVKDQLKIFIEMKLRDESLDHVLIjFGP 60 

M R LD++ MG+EE +RTLRPQYL EYIGQDKVK+Q IFIEAAK RDESLDHVLLFGP 
Sbjct: 25 MARILDNNVMGWEEFSDRTLRPQYLHEYIGQDKVXEQ?AI FIE&AKRRDESLDHVLLFGP 84 

5 Query: 61 PGLGKTTMAFVIANELGVNLKQTSGPAIEKSGDLVAIMDLEPGDVLFIDEIHRMPMAVE 120 

PGLGKTTMAFVIANELGVNLKQTSGPA+EK+GDLVAIIiN+LEPGD+LFIDEIHRMPM+VE 
Sbjct: 85 PGLGKTTMAFVIANELGVNLKQTSGPAVZKAGDLVAIIJJELEPGDILFIDEIHRMPMSVE 144 

Query: 121 EVLYSAMEDFYIDIMIGAGETSRSVHLDLPPFTLIGATTRAGMLSNPLRARFGITGHMEY 180 
10 ' EVLYSAMEDFYIDIMIGAG+TSRS+HLDLPPFTLIGATTRAGMLSNPLRARFGITGHMEY 

Sbjct: 145 EVLYSAMEDFYIDIMIGAGDTSRSIHLDLPPFTLIGATTRAGMLSNPLRARFGITGHMEY 204 

Query: 181 YEENDLTEIIERTADIFEMKITYEAASELARRSRGTPRIANRLLKRVRDYAQIMGDGLID 240 
Y+E DLTEI+ERTA IFE+KI +EAA +LA RSRGTPRIANRLLKRVRDYAQI+GDG+I 
15 Sbjct: 205 YQEKDLTEIVERTATIFEIKIDHEAARKLACRSRGTPRIANRLLKRVRDYAQIIGDGIIT 264 

Query: 241 DNITDKALTMLDVDHEGLDYVDQKILRTMIEMYNGGPVGLGTLSVNIAEERDTVEDMYEP 300 

ITD+ALTMLDVD EGLDY+DQKILRTMIEMY GGPVGLGTLSVNIAEER+TVE+MYEP 
Sbjct: 265 AQITDRALTMLDVDREGLDYIDQKILRTMIEMYQGGPVGLGTLSVNIAEERNTVEEMYEP 324 

20 

Query: 301 YLIQKGFIMRTRTGRVATVKAYEHLGY 327 

YLIQKGF+MRTRTGRVAT KAY HLGY 
Sbjct: 325 YLIQKGFLMRTRTGRVATQKAYRHLGY 351 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 278 

A DNA sequence (GBSx0304) was identified in S.agalactiae <SEQ ID 891> which encodes the amino acid 

sequence <SEQ ID 892>. Analysis of this protein sequence reveals the following: 

30 Possible site: 43 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -2.87 Transmembrane 157 - 173 ( 157 - 174) 
INTEGRAL Likelihood = -1.49 Transmembrane 205 - 221 ( 205 - 222) 

35 Final Results 

bacterial membrane --- Certainty=0 .2147 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 893> which encodes the amino acid 
sequence <SEQ ID 894>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm --- Certainty=0 .3097 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

50 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 130/303 (42%) , Positives = 202/303 (65%) 

Query: 1 MLKHFGSKVRNLRVTRNITREDFCGDETELSWQLARIESGQSIPNLTK7AHYIAKQLNVK 60 
55 ML+HFG KV+ LR+ + I+RED CGDE+ELSVRQLARIE GQSIP+L+K +IAK LNV 

Sbjct: 1 MLEHFGGKVKVLRLEKRISREDLCGDESELSWQIiARIELGQSIPSLSKVIFIAKMiNVS 60 

Query: 61 LDILTGGESLELPKRYKELKYLILRIPTYADAERLKLRECQFDHIFEEFYDNLPEDECLA 120 
+ LT G LELPKRYKELKYLILR PTY D +L++RE QFD IFE++YD LPE+E + 
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Sbjct: 61 VGYLTDGADLELPKRYKELKYLILRTPTYMDD3KLQVREEQFDEIFEDYYDKLPEEEKII 120 

Query: 121 IDSLQAKFEVYQTODINFGVEVLCECTDKVKrKEKYTIJSIDLIlIDLFLTCaWSKFNNRA 180 

ID LQA + + + NFG+++L E F+++K K ++ NDLI+++L+L + + + 
Sbjct: 121 IDCLQATLDTLLSENTNFGIDLLQEYFNQIKTKVRFRQNDLILLELYIiAYLDIEGMDGQY 180 

Query: 181 FTKEVFQTICKTLISQNHKLTAFJJLFWFNHVLLNCVFVGLCLNSEECLflEMLEVSRQTMV 240 

K + ++ L Q + ++LF N ++++ + L N + L + +E+S++ M 
Sbjct: 181 SDKIFYDSLLDI^SEQFEQFELDELFIVfflCIIIDISSLSLKNNRLDNLEKAIEMSQKIMA 240 

Query: 241 STHDFHKMPLYFlOTQWKYFITIDNDIKSAEmYQQSIMFSKMIDDKHLlKKLELEWQEDI 300 

D+++MP+ + +WKYF+ DI AE ++ ++ +F++M D++L KIi EW++D+ 
Sbjct: 241 KIQDWMRMPILKLIEWKYFLIKQKDIIKAEQSFMKACLFAQMTADQYLENKLIQEWEKDV 300 

Query: 301 TGH 303 

Sbjct: 301 KSY 303 

SEQ ID 892 (GBS319) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total ceU 
extract is shown in Figure 40 (lane 4; MW 37kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 46 (lane 7; MW 62kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 279 

A DNA sequence (GBSx0305) was identified in S.agalactiae <SEQ ID 895> which encodes the amino acid 
sequence <SEQ ID 896>. This protein is predicted to be adenylosuccinate lyase (purB). Analysis of this 
protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3358 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MIERYSRPEMAAIWTEENKYRAWLEVEIIADEAWAELGEIPKEDVAKIREKADFDIDRIL 60 

MIERY+RPEM AIWTEEN+Y+AWLEVEI+A EAWAELGEIPKEDV KIRE A FD++RIL 
Sbjct: 1 MIERYTRPEMGAIWTEENRYQAWLEVEIVACEAWAELGEIPKEDVKKIREHASFDVERIL 60 

Query: 61 EIEQDTRHDWAFTRAVSETLGEERKWVHYGLTSTDWDTAYGYLYKQANDIIRRDLENF 120 

EIEQ+TRHDWAFTRAVSETLGEERKWVHYGLTSTDWDTA YL KQAN+II DL F 
Sbjct: 61 EIEQETRHDWAFTRAVSETLGEERKWVHYGLTSTDVVDTALSYLLKQANEIIEADLVRF 120 

Query: 121 TNIVADKAKEHKFTIMMGRTHGVHAEPTTFGLKLATWYSEMKRNIERFEHAAAGVEAGKI 180 

+1+ +KA EHK+T+MMGRTHGVHAEPTTFGLKLA WY EMKRN+ERF AA GV GK+ 
Sbjct: 121 LDILKEKALEHKYTVMMGRTHGVHAEPTTFGLKLALWYEEMKRNLERFRLAAEGVRVGKL 180 

Query: 181 SGAVGNFANIPPFVEQYVCDKLGIRPQEISTQVLPRDLHAEYFAVIASIATSIERMATEI 240 

SGAVG +ANI PFVEQYVC+KLG+ ISTQ L RD HAEY A LA IATSIE+ A EI 
Sbjct: 181 SGAVGTYANIDPFVEQYVCEKLGLERAPISTQTLQRDRHAEYMATLALIATSIEKFAvEI 240 
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Query: 3 01 RDISHSSflERIITPDTTILIDYMLNRFGNIi/KNLTVFPEM/LMPJOMESTFGLIYSQRVMLK 3 SO 

RDISHSSAERII PD TI I+YMLKRFGNIVKNLTVFPENM RNM T+GLIYSQRV+L 
Sbjct: 301 RDISHSSAERIILPDATIAIWMIiNRFGNIVKNLWFPEllIMIQlNMTRTYGLIYSQRVLLS 360 

Query: 3 SI LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 420 

LI+KGM REEAYDLVQPK +W+ V F+ L+E++ ++TS L+ EEI+ F+ ++ K 
Sbjct: 3 SI LIDKGMVREEAYDLVQPKAMEAWEKGVQFRELVEQEERITSVLSPEEIEACFDYNHHLKH 420 

Query: 421 VDDIFERLGL 430 

VD IFERLGL 
Sbjct: 421 VDTIFERLGL 430 

A related DNA sequence was identified in S.pyogenes <SEQ ID 897> which encodes the amino acid 
sequence <SEQ ID 898>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3358 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 422/430 (98%) , Positives = 428/430 (99%) 



Query: 121 TNIVADKAKEHKFTIIWGRTHGVHAEPTTFG1.KLATWYSEMKRNIERFEHAAAGVEAGKI 180 

TNIVADKA+EHK TirWGRTHGVHAEPTTFGLKLATWYSEMI^RNIERFEHAAAGVEAGKI 
Sbjct: 121 TNIVADKAREHKMTimGRTHGVHAEPTTFGLKIATWYSEMKRNIERFEHAftAGVEAGKI 180 

Query: 181 SGAVGNFANIPPFVEQYVCDKLGIRPQEISTQVLPRDLHAEYFAVIASIATSIERMATEI 240 

SGAVGNFANIEPFVE+YVCDICLGIRPQEISTQVLPRDLHAEYFAVLASIATSIERMATEI 
Sbjct: 181 SGAVGNFANI PPF VEE YVCDKLGI RPQE ISTQVLPRDLHAEYFAVLAS I ATS I ERMATE I 240 

Query: 241 RGLQKSEQREVEEFFAKGQKGSSAMPHKSNPIGSEKMTGLARVIRGHMVTAYENVALWHE 3 00 
RGLQKSEQRE\7EEFFAKGQKGSSAM?HKRNPIGSFA^TGIiARVIRGHMVTAYEOT+LWHE 

Query: 301 RDISHSSAERIITPDTTILIDYMEJ^FGNIVKNLTVFPENMMRNMESTFGLIYSQRVMLK 360 

RDISHSSAERIITPDTTILIDYMLNRFGNIVKKLTVFPEIWMRNMESTFGLIYSQRVMLK 
Sbjct: 301 RDISHSSAERIITPDTTILIDYMLNRFGNIVKNLWFPENMMRNMESTFGLIYSQRVMLK 360 

Query: 361 LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 420 

LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 
Sbjct: 3 SI ■ LIEKGMTREEAYDLVQPKTAYSWDNQVDFKPLLEEDTKVTSCLTQEEIDELFNPIYYTKR 420 

Query: 421 VDDIFERLGL 430 

VDDIF+RLG+ 
Sbjct: 421 VDDIFKRLGI 430 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 280 

A DNA sequence (GBSx0306) was identified in S.agalactiae <SEQ ID 899> which encodes the amino acid 
sequence <SEQ ID 900>. Analysis of this protein sequence reveals the following: 



Possible site: 45 
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■ Final Results 

bacterial membrane --- Certainty=0. 7496 (Affirmative) . 
bacterial outside --- Certainty=C . 0000 (Not Clear) < s 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < £ 



20 The protein has homology with the following sequences in the GENPEPT database: 



Query: 152 ILLLIAEVSIGKNR-VYNFVQNLNYFEEVrWNYFEENPVKIKEKSLIIK FLLTIS 205 

IL L SI NR + ++ N ++ N+F+ + +K K L+I F++ +S 

Sbjct: 133 ILFLYLlYSILINRFILKWLDNSGIIYKININWFKNHMIianNKMLVINlKFFNFIIKLS 192 

Query: 206 FVFVIDFAMVRL LNFNIKFSTILACSAILIAWLyQN— KSVTEPFL 249 

+ +1 +++ h +NF+I+ I I ++ S+ F 

Sbjct: 193 IITIIGlSIMELFGIFGIHFDIRIIIINyLKTINSGKIHLTIINMDQYSVLENSIHTIFY 252 

Query: 250 LKKLVIYFIFFIATLIGNLKN-ELSILETPLLFISIFFTMDRIIALSKEMRDLI--ISKS 306 

+ L+I+ IF L N+KN + +1 +L+I IF I ++DL+ +4K 

Sbjct: 253 INLLIIFLIFISLILYRNVKNIDTNIKRVIIILYILIFLINIIFIFNHIYIKDLMDNLNKY 312 

Query: 307 ILFYYDHENIKPSILLSEIKEIKYLENVDIGE LELVRQMVIRLRLELEEKFLILSDI 363 

IL Y D I S+ L ++K L+ ++I + V+ + 1+ ++E L + I 

Sbjct: 313 ILDYMDLHIIWSLFLFNKFDVK-LKKINIYKSYSTVTVKDLEIKSKIEERSNELDIKLI 371 

Query: 364 YMKNG-YEKYIQFVQGNVYFINLE- -LDKIPNYTNLKLILESIFD HNNQKIFIPKL 416 

K G YE YI ++ N+ +4 E Ii B Y N +E + + f F+ K+ 

Sbjct: 372 IAKYGSYENYINSIE-NINIVDEEFILKNYPEYINDSKFIEFLMELEPLFRDHTEFVKKI 430 

Query: 417 YEEYIYILISLGEVEKAKEIL KEVSDYLTEESL 449 

YE L + K+IL KE+ DY+ + 4L 

Sbjct: 431 YENLNSTNEKLEFLIiANKDILSENKEIFDYVLQLKL 466 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 281 

A DNA sequence (GBSx0308) was identified in S.agalactiae <SEQ ID 901> which encodes the amino acid 
sequence <SEQ ID 902>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3307 (Affirmative) < suco 
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bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 282 

A DNA sequence (GBSx0309) was identified in S.agalactiae <SEQ ID 903> which encodes the amino acid 
10 sequence <SEQ ID 904>. This protein is predicted to be purK (purK). Analysis of this protein sequence 
reveals the following: 
Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0. 0334 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 9461> which encodes amino acid sequence <SEQ ID 9462> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA04376 GB:AJ000883 purK [Lactococcus lactis] 
Identities = 208/347 (59%), Positives = 258/347 (73%), Gaps = 3/347 (0%) 

25 

Query: 14 NSFKTIGIIGGGQLGQMMAIAAiyMGHKVITLDPASDCPASRVS-EVIVAPYDDVEALGT 72 

N+ +TIGI1GGGQLGQMMAIAA YMGHKVTTLDP +C A++VS E+IVAPYDDVE L 
Sbjct: 4 NTKQTIGIIGGGQLGQMMAIARQYMGHKVITLDPNPNCSAAKVSDELIVAPYDDVENLLR 63 

30 Query: 73 LAARCDVLTYEFENVDADGLDAWSAGQLPQGTDLLR:SQNRIFEKDFI^KAGVTVAPY 132 

LA CDV+TYEFENV A L + ++PQG LL I+QNR FEK+FL N+A V VAP+ 
Sbjct: 54 IAYACDVITYEFENVSAPCALHEIEGCVRIPQGIFJjI£ITQNRRFEKEFLTNEAPWNVAPVf 123 

Query: 133 KVVTSSLDLEGLDLTKIYVLKTATGGYDGHGQKVIRSAEDLPEAQQIANSAQOTLEEFVN 192 
35 ++V S+ L +T+ VLKT TGGYDGHGQ V+ + E L A+ L ++CVLE+F++ 

Sbjct: 124 QLVDSAEKLPET-VTRKQVLKTTTGGYDGHGQVVLNTDEKLSAAKSLTELSECVIjEDFIS 182 

Query: 193 FDLEISVIVSGNGQDVTVFPVQEMIHKNNILSKTIVPARISDQLADKAKEMAVQIAKKLQ 252 
F+ EISVI+SGNG + VFP+ EN HR NIL +TI PARIS ++ + A ++A IA+KL+ 
40 Sbjct: 183 FEREISVIISGNGHEYWFPIAENEHRENILHQTISPARISAEITENAYKIATSIAEKLE 242 

Query: 253 LSGTLCVEMFATAD-DIIVNEIAPRPHNSGHYSIKACDFSQFDTHILGVLGAPLPPIKLH 311 

LSG LCVEMF TAD I VNE+APRPHNSGH++IEACDF+QFD HI G+LG LP KL 
Sbjct: 243 LSGVLCvEMFLTADGQIYvNELAPRPHNSGHFTIKB.CDFNQFDLHIKGILGEDLPEPKLL 302 

45 

Query: 312 APAVMENVLGQHVQQAIDHVAQNPSAHLH^GKLEaKHNRKMGHVTV 358 

PA+M NVLGQHV+ ++ H H YGK +AKHNRKMGHVT+ 

Sbjct: 303 KPAIMLNVLGQHVEAWKI^NHEHADITOQHDYGKADAKHNRKMGHVTI 349 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 905> which encodes the amino acid 
sequence <SEQ ID 906>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 



55 



Final Results 
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bacterial cytoplasm Certainty=0 . 0334 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 344/369 (93%) , Positives = 353/369 (95%) 

Query: 1 MRNKKKSQRSQAMNSFKTIG1IGGGQLGQMMMAAIYMGHKVITLDPASDCPASRVSEVI 60 

MRNKEKSQRSQ +NSFKTIGIIGGGQLGQMMAIAAIYMGHKVITLDPASD PASRVSEVT 
Sbjct: 1 MRNKEKSQRSQWNSFKTIGIIGGGQLGQMMAIAAIYMGHKVITLDPASDSPASRVSEVI 60 

Query: 61 VAPYDDVEALGTLAARCDVLTYEFENVDADGLDAWSAGQLPQGTDLLRISQNRIFEKDF 120 

VAPYDDVEALG LAARCDVLT YE FENVDADGLDA WSA QLPQGTDLLRISQNRI EKDF 
Sbjct: 61 VAPYDDVEALGQLAARCDVLTYEFENVDADGLDAWSACQLPQGTDLLRISQNRIVEKDF 120 

Query: 121 LANKAGVTVAPYKWTSSLDLEGLDLTKTYVLKTATGGYDGHGQKVIRSAEDLPEAQQLA 180 

LANKAGVTVAPYKWTSSLDL GLDLTKTYVLKT TGGYDGHGQK+IRSAEDLPEAQQLA 
Sbjct: 121 LANKAGVTVAPYKVVTSSLDLGGLDLTKTYVLKTETGGYDGHGQKIIRSAEDLPEAQQLA 180 

Query: 181 NSAQCVLEEFVNFDLEISVIVSGNGQDVTVFPVQENIKRNNILSKT1VPARISDQLADKA 240 

NSAQCTLEEFVNFDLE1SV1VSGNG+DVTVFPVQENIHRNNILSKTIVPARISDQIADKA 
Sbjct: 181 NSAQCTLEEFWFDLEISVIVSGNGKDVTVFPVQENIHRNNILSKTIVPARISDQLADKA 240 

Query: 241 KEMAVQIAKKLQLSGTLCVEMFATADDIIVNEIAPRPHNSGHYSIEACDFSQFDTHILGV 300 

K+ AVQIAKKLQLSGTLCVEMF TADD I I VNE IAPRPHNSG YSIEACDFSQFDTHILGV 
Sbjct: 241 KKTAVQIAKKLQLSGTLCVEMFTTADDIIVNEIAPRPHNSGRYSIEACDFSQFDTHILGV 300 

Query: 301 LGAPLPPIKLHAPAVMFNVLGQHVQQAIDHVAQNPSAHLHMYGKLFAmNRKMGHVTVFS 360 

LGAPLP I+LHAPAVM NVLGQHVQQA D+VA+NPSAHLHMYGKLEAKHNRKMGHVTVF+ 
Sbjct: 301 LGAPLPQIQLHAPAVMLNVLGQHVQQATDYVAKNPSAHLHOTGKLEAKHNRKMGHVTVFA 360 

Query: 361 DVPDEVEEF 369 
DEV+EF 

Sbjct: 361 KDADEVKEF 369 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 283 

A DNA sequence (GBSx0310) was identified in S.agalactiae <SEQ ID 907> which encodes the amino acid 
sequence <SEQ ID 908>. This protein is predicted to be phosphoribosylaminoimidazole carboxylase 
catalytic subunit (purE). Analysis of this protein sequence reveals the following: 

■> N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3572 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12462 GB:Z99107 phosphoribosylaminoiTridazole carboxylase I 
[Bacillus subtilis] 
Identities = 106/162 (65%) , Positives = 128/162 (78%) 

Query: 33 MQPIISIIMGSKSDWTTMQKTAEvLDNFGIAYEKKWSAHRTPDLMFKHAEEARGRGIKI 92 

MQP++ IIMGS SDW TM+ ++LD + YEKKWSAHRTPD MF++AE AR RGIK+ 
Sbjct: 1 MQPLVGIIMGSTSDWETMKHRCDILDELNVPYEKKV\fSAHRTPDFMFEYAETARERGIKV 60 



Query: 93 IIAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVATMAIGEAG 152 
60 IIAGAGGAAHLPGM AAKTTLPVIGVPV+S+AL+G+DSL SIVQMPGGVPVAT +IG+AG 
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Sbjct: 61 IIAGAGGAAHLPGMTAAKTTLPVIGVPVQSKAIIJGMDSLLSIVQMPGGVPVATTSIGKAG 120 

Query: 153 ATNAALTALRILSIEDQNIADALAHFHE3CGKIAZESSNELI 194 

A NA L A +ILS D++LA L E + ESS++L+ 
Sbjct: 121 AVmGLI^QILSAFDEDLARKLDERRENTKOTVEESSDQLV 162 

A related DNA sequence was identified in S.pyogenes <SEQ ID 909> which encodes the amino acid 

sequence <SEQ ID 910>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.08 Transmembrane 36 - 52 ( 34 - 52) 

Final Results 

bacterial membrane Certainty=0. 2232 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 46 ISIIMGSKSDWATMQKTAEVLDNFGIAYEKKWSAHRTPDLMFKHAEEARGRGIKIIIAG 105 

++IIMG SDWATM++TA++LD+FG+AYEKKWSAHRTP LM + + +AR RG K+IIAG 
Sbjct: 4 VAIIMGCSSDWATMKETAKILDDFGIAYEKKWSAHRTPALMAEFSSQARERGYKVIIAG 63 

Query: 106 AGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVATMAIGEAGATNA 165 

AGGAAHLPGMV+A+T +PVIGVP+KSRALSGLDSLYSIVQMP GVPVATMAIGEAGA NA 
Sbjct: 64 AGGAAHLPGMVSAQTLVPVIGVPIKSRALSGLDSLYSIVQMPAGVPVATMAIGEAGAKNA 123 

Query: 166 ALTALRILSIEDQNLADALAHFHEEQGKIAEESSGELI 203 

AL AL++L+ ++NL L + ++ EES+ L+ 

Sbjct: 124 ALFALQLLANTMENIjIQKLLVYRAAAQEMVEESWKALL 151 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 162/169 (95%) , Positives = 164/169 (96%) , Gaps = 1/169 (0%) 

Query: 27 PLYLNIMQ-PI ISI IMGSKSDWTTMQKTAEVLDNFGIAYEKKWSAHRTPDLMFKHAEEA 85 

PL + IM+ PIISIIMGSKSDW TMQKTAEVLDNFGIAYEKKWSAHRTPDLMFKHAEEA 
Sbjct: 35 PLCILIMKTPI ISI IMGSKSDWATMQKTAEVLDNFGIAYEKKWSAHRTPDLMFKHAEEft 94 

Query: 86 RGRGIKIIIAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVAT 145 

RGRGIKIIIAGAGGAAHLPGMVAAKTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVAT 
Sbjct: 95 RGRGIKIIIAGAGGAAHLPGMVAAiCTTLPVIGVPVKSRALSGLDSLYSIVQMPGGVPVAT 154 

Query: 146 MAIGFAGATNAALTALRILSIEDQNLADftLAHFHEEQGKIAEESSNELI 194 

MAIGEAGATNAALTALRILSIEDQNLADALAHFHEEQGKIAEESS ELI 
Sbjct: 155 MAIGFAGATNAALTALRILSIEDQNIjADAIJfflFHEECGKIAEESSGELI 203 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 284 

A DNA sequence (GBSx0311) was identified in S.agalactiae <SEQ ID 91 1> which encodes the amino acid 
sequence <SEQ ID 912>. This protein is predicted to be phosphoribosylglycinamide synthetase (purD). 
Analysis of this protein sequence reveals the following: 

:> N-terminal signal sequence 
■ Final Results 
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bacterial cytoplasm Certainty=0 . 1966 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

= 7/419 (1%) 

MKLLWGSGGREHAIAKKLLASKDVDQVPVAPSNDGMTLDGLDLVNIGISEHSRLIDFVK 6 0 
MK+LV+GSGGREHA+AKK + S V++VFVAPGN GM DG+ +V+I + +L+ F + 
MKILVIGSGGREHAIAKKFMESPQV3EVFVAPC-NSGMEKDGIQIVHISELSNDKLVKFAQ 60 



Y TF E A AY++E+G P+V+KADGLA GKGV VA +E A A +4 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


176 


Query: 


241 


Sbjct: 


236 


Query: 


301 


Sbjct: 


296 




361 


Sbjct: 


355 



+WIEEFLDGEEFSLF+F + K Y MP AQDHKRA+D DKG NTGGMGAY+PV H+ •) 



W+ A+E +VKP + GMI EG+ + GVLYAGLILT DG K IEFN+RFGDPETQ++LPR 
EVVNEALEKVVKPTVAGMIEEGKSFTGVLYAGLILTEDGVKTIEFNARFGDPETQVVLPR 295 

LTSDFAQNIDDIMMGIEPYITWQKDGVTLGVWASEGYPLDYEKGVPLPEKTDGDIITYY 360 
L SD AQ I DI+ G EP + W + GVTLGVWA+EGYP + G+ LPE +G + YY 
LKSDIAQAIID1IAGNEPTLEWLESGVTLGVVVAAEGYPSQAKLGLILPEIPEG-LNVYY 354 

AGAKFAENSKALLSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRNDIGSKAI 419 
AG EN++ L4S+GGRVY++ T + VK+ Q +Y +L + + G FYR+DIGS+AI 
AGVSKNENNQ-LISSGGRVYLVSETGEDVKSTQKLLYEKLDKLENDGFFYRHDIGSRAI 412 

A related DNA sequence was identified in S.pyogenes <SEQ ID 913> which encodes the i 

sequence <SEQ ID 914>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have no N-terminal signal sequence 

Likelihood = -0.80 Transmembrane 5 - 21 ( 5-21) 



Final Results 

bacterial membrane Certainty=0. 1319 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

=,GP:CAA04374 GB:AJ000883 purD [Lactococcus lactis] 
Identities = 236/419 (56%), Positives = 301/419 (71%), Gaps = 7/419 (1%) 

Query: 50 LKLLWGSGGREHAIAKtOjLASKGVDQVFVAPGNDGMTLDGLDLVNIWSEHSRLIAFAK 109 

+K+LV+GSGGREHA+AKK + S V++VFVAPGN GM DG+ +V+I + +L+ FA+ 
Sbjct: 1 MKILVIGSGGREHALAKKFMESPQVEEVFVAPGNSGMEKDGIQIVHISELSNDKLVKFAQ 60 

Query: 110 ENEISWAFIGPDDALAAGI vDDFNSAGLRAFGCTKAAAELEWSKDFAKEI^IVKYNVPTAA 169 

I F+GP+ AL G+VD F A L FGP K AAELE SKDFAK IM KY VPTA 
Sbjct: 61 NQNIGLTFVGPETALmGVVDAFIKAELPIFGPNKMAASLEGSKDFAKSIMKKYGVPTAD 120 

Query. 170 YGTFSDFEKAKAYIEEC^PIWKADGLAIfiKEVVVAETVEQAVF^QEM 229 

Y TF E A AY++E+G P+V+KADGLA GKGV VA +E A A ++ F S 

Sbjct: 121 YATFDSLEPALAYLDEKGVPLVIKADGLAAGKG\T^AFDIETAKSALADI FSGSQ 175 

Query: 230 ARWIEEFLDGEEFSLFAFANGDKFYIMPTAQDHKRAFDGDKGPNTGGMGAYAPVPHLPQ 289 
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+WIEEFLDGEEFSLF+F + K Y MP AQDHKRAFD DKGPNTGGMGAY+PV H+ + 
Sbjct: 176 GKOTIEEFLDGEEFSLFSFIHDGKIYPMPIAQDHKRAFDEDKGPJWGGMGAYSPVLHISK 235 

Query: 290 SVVDTAVEMIWPVLEGWAEGRPYLGVLYVGLILTADGPKVIEFNSRFGDPETQririPR 349 

W+ A+E +V+P + GM+ EG+ +■ GVLY GLILT DG K IEFN+RFGDPETQ++LPR 
Sbjct: 236 EWNBmEKWKPTVAGMIEEGKSFTGVIJYAGLILTEDG^7KTIEFNARFGDPETQVVl J PR 295 

Query: 350 LTSDFAQNIDDimGIEPYITWQKDGVTLGVWASEGYPFDYEKGVPLPEKTDGDIITYY 409 

L SD AQ I DI+ G EP + W + GVTLGVWA+EGYP + G+ LPE +G + Ti 
Sbjct: 296 LKSDLAQAIIDILAGlffiPTLEWLESGVTLGVVVAAEGYPSQAKLGLILPEIPEG-IJWYY 354 

Query: 410 AGVKFSENSEr,LLSNGGRVYMLVTTEDSVKAGQDKIYTQIAQQDTTGLFYRriDIGSKAI 468 

AGV +EN++ L+S+GGRVY++ T + VK+ O +Y +h + + G FYR+DIGS+AI 
Sbjct: 355 AGVSKOTNWQ-L1SSGGRVYLVSETGEDVKSTQKLLYEKLDKLENDGFFYRHDIGSRAI 412 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 399/421 (94%), Positives = 408/421 (96%) 

Query: 1 MKLLWGSGGRSHAIAKKLLASKDVDQVFVAPGiSlDGMTLDGIjDLVNIGISEHSRLrDFVK 60 

+KLLWGSGGRBHAIAKKLLASK VDQVFVAPGNDGMTLDGLDLVNI +SEHSRLI F K 
Sbjct: 50 LKLLWGSGGREmiAKKLLASKGVDQVFVAPGMDGMTLDGLDI.WIWSEHSRI.rAFAK 109 

Query: 61 EWEIAWTIiIGPIX)ALAAGIVDGFWSAGLRAFGPTKAAAELEWSKDFAKSIM\ncYWPTAA 120 

ENEI+W IGPDDALAAGIVD FNSAGLRAFGPTKAAAELEWSKDFAKElMVKyNVPTAA 
SbjCt: 110 ENEISWAFIGPDDAU^GIVDDFNSAGLRAFGPTKAAAELEWSKDFRKEIMVKYWPT^ 169 

Query: 121 YGTFSDFEKAKAYIEEQGAPIWKADGLALGKGVWAETVEQAVEAAQEMLIiDNKFGDSG 180 

YGTFSDFEKAKRYIEEO^PIWKADGLALGKGWVAETVEQAVEAAQEMLLDNKFGDSG 
SbjCt: 170 YGTFSDFEKAKflYIEEQGAPIVVKADGLALGKGWVAETVEQAVEAflQEMIjLDNKFGDSG 229 

Query: 181 ARWIEEFLDGEEFSLFAFANGDKFYIMPTAQDHKRAYDGDKGLNTGGMGAYAPVPHLPQ 240 

ARWIEEFLDGEEFSLFAFANGDKFYIKPTAQDKKRA+DGDKG NTGGMGAYAPVPHLPQ 
SbjCt: 230 ARWIEEFLDGEEFSLFAFANGDKFYIMPTAQDHKRAFDGDKGPNTGGMGAYAPVPHLPQ 289 

Query: 241 SVVDTAWlVKPVLEGMIAEGRPYLGVLYAGMLTADGPiO/IEFNSRFGDPETQIILPR 300 

SWOTAVE IV+PVLEGM+AEGRPYLGVLY GblLTADGPKVIEFNSRFGDPETQIILPR 
SbjCt: 290 SWDTAVEMIVRPVLEGMVAEGRPYLGVLYVGLILTADGPKVIEFNSRFGDPETQ1ILPR 349 

Query: 301 LTSDFAQNIDDIMMGlEPYITWQKDGVTLGVWASEGyPLDYEKGVPLPEKTDGDIITYY 360 

LTSDFAQNIDDIMMGIEPYITWQKDGVTLGVWASEGYP DYEKGVPLPEKTDGDIITYY 
Sbjct: 350 LTSDFAQNIDDIMMGIEPYITWQKDGVTLGWVASEGYPFDYEKGVPLPEKO'DGDIITYY 409 

Query: 361 AGAKFAENSKALLSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRHDIGSKAIKE 421 

AG KF+ENS+ LLSNGGRVYMLVTTEDSVKAGQDKIY'TQIAQQDTXGLFYRNDIGSKAI+E 
SbjCt: 410 AGVKFSENSELLLSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRNDIGSKAIRE 470 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 285 

A DNA sequence (GBSx0312) was identified in S.agalactiae <SEQ ID 915> which encodes the amino acid 
sequence <SEQ ID 916>. Analysis of this protein sequence reveals the following: 
Possible site: 36 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.28 Transmembrane 235 - 251 ( 235 - 251) 



Final Results 

bacterial membrane Certainty=0 .1510 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 
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Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


113 


Sbjct: 






173 


Sbjct: 






232 


Sbjct: 


241 



MTIYDQIESALDLMTDLEREIACYFMGQPISKE'ALASTIVTKQLHISQAALTRFAKKCGF 60 
M I +Q+E+ T E+ + Y + + +1+ K+ + +A +TRF KK GF 

MGILEQLENPKFKATKSEKTLIEYIKSDLDNIIYKSISIIAKESGVGEATITRFTKKLGF 60 



+M+ S +1 



LEVSHMIEQADRVYFYGKGSSSLVAKEFXIRLKRLGVI CEALDDTDS FSWTNS IVNDRCL 172 

+ +1 A RVYF G G S + A + + MR+G + D+ + +SI ND + 

CKCRDLIMNAKRVYFIGIGYSGIAATDINYKFMRIGFTTVPVTDSHTMVIMSSITNDDDV 180 



S +IP + ++D++Y + 



A related DNA sequence was identified in S.pyogenes <SEQ ID 917> which encodes the amino acid 
sequence <SEQ ID 918>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terrainal signal sequence 

INTEGRAL Likelihood = -4.88 Transmembrane 243 - 259 ( 242 - 261) 

Final Results 

bacterial membrane Certainty=0. 2954 (Affirmative) < suoo 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suoo 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suoo 

A related sequence was also identified <SEQ ID 9093> which encodes the amino acid sequence <SEQ ID 
9094>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 56 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.88 Transmembrane 239 - 255 ( 238 - 257) 

Final Results 

bacterial membrane Certainty= 0. 295 (Affirmative) < suoo 

bacterial outside Certainty= 0 . 000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 138/263 (52%) , Positives = 189/263 (71%) , Gaps = 2/263 (0%) 





6 


Sbjct: 


14 




66 


Sbjct: 


74 


Query: 


126 


Sbjct: 




Query: 


186 


Sbjct: 


194 



. AL AS GAKTVL T T D + D II V+S L YGNR+SPQ P+LIM+DII 
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Query: 245 YAQFLDINKIEKERI FRETI IQR 267 

YA L I+K KE+IF+ Til + 
Sbjct: 253 YAYVLAIDKPHKEKI FKNTI I DK 275 

SEQ ID 916 (GBS320) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 40 (lane 5; MW 33kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 7; MW 58kDa) and in Figure 
160 (lane 7 & 8; MW 58kDa). 

GBS320-GST was purified as shown in Figure 224, lane 3-4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 286 

A DNA sequence (GBSx0313) was identified in S.agalactiae <SEQ ID 919> which encodes flie amino acid 
sequence <SEQ ID 920>. This protein is predicted to be xylan esterase 1 (cephalosporin-C). Analysis of this 
protein sequence reveals the following: 

d N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4981 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB68821 GE:AF001926 xylan esterase 1 [Thermoanaerobacterium sp. 
■JW/SL YS485'] 

Identities - 133/299 (44%), Positives = 188/299 (62%), Gaps = 1/299 (0%) 

Query: 5 MSLDDMREYLGQDQIPEDFDDFWKKQTMKYQG-SIEYRLDKKDFNITFAQAYDLHFKGSN 63 

M L +REY G + PEDFD++W + ++++L+F ++FA+ YDL+F G 
Sbjct: 6 MPLQKLREYTGTNPCPEDFDEYWNRALDEMRSVDPKIELKESSFQVSFAECYDLYFTGVR 65 

Query: 64 NSIVYAKCLFPKIWKPYPWFYFHGYQNQSPDWSDQLNYVflAGYGVVSMDVRGQAGQSQD 123 

+ ++AK + PKT +P + FHGY + S DW+D+LNYVAAG+ W+MDVRGQ GQSQD 
Sbjct: 66 GARIHAKYIKPKTEGKHPALIRFHGYSSNSGDWHDKLNYVAAGFTWAMDVRGQGGQSQD 125 

Query: 124 KGHFDGITVKGQIVRGMISGPNHLFYKDIYLDVFQLIDIIATLESVDSNQLYSYGWSQGG 183 
G G T+ G I+RG+ +++ +4- I+LD QL 1+ + VD +++ G SQGG 

Query: 184 ALALIAAAI^PKIVKTVAVYPFLSDFRRVIjDI/^VSEPYDELFRYFKYSDPFHKTENNVL 243 

L+L AAL P++ K V+ YPFLSD++RV DL Y E+ YF+ DP H+ EN V 

Sbjct: 186 GLSIACAALEPRWKWSEYPFLSDYKRVWDLDLAICCVYQEITDYFRLFDPRHERENEVF 245 

Query: 244 KTIAYIDVKNFAHRISCPVVLLTALKDDICPPSTQFAIFNRLTSTKKHLLLPDYGHDPM 302 

L YIDVKN A RI V++ L D +CPPST FA +N + S K + PDYGH+PM 
Sbjct: 245 TKLGYIDVI<NLAKRIKGDVLMCVGL>1DQVCPPSWFAAYNNIQSKKDIKVYPDYGHEPM 304 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 287 

A DNA sequence (GBSx0314) was identified in S.agalactiae <SEQ ID 921> which encodes the amino acid 
sequence <SEQ ID 922>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.73 Transmembrane 128 - 144 ( 126 - 145) 



Final Results 

bacterial membrane Certainty=0 . 3293 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 131 CLTIGTGIGGCLIIDKTVFHGFSNSACEVGYMHLSDGDFQDLASTTALIADVAKAHGDEI 190 

CLTIGTGIGG LIID V HGFSNSA E+GYM ++ + QD+AS +AL+ 4VA G E 
Sbjct: 18 CLTIGTGIGGALIIDGKVLHGFSNSAGEIGYMviVKGENIQDIASASALVKNVALRKGVEP 77 

Query: 191 SRWDGRRIFQEAKKGNEKCIASIDRMINYLGQGIANMVYVVNPEKVVLGGGIMAQKDYLQ 250 

S DGR + + G+ C ++++ + L GI+N+VY++NPE WLGGGIMA+++ + 
Sbjct: 78 SSIDGRYVLDNYENGDLICKEEVEKLADNLALGISNIV 137 

Query: 251 DKLSESLKRNLOTSLAEKTAIVFAQHENQAGYLGAYYHFK 2 9D 

+ SL++ L+ S+ T I FA+ +N AGM GAYY+FK 
Sbjct: 13 8 PLIENSLRKYLIESVYNNTKIAFAKLKNTAGMKGAYYNFK 177 

A related DNA sequence was identified in S.pyogenes <SEQ ID 923> which encodes the amino acid 
sequence <SEQ ID 924>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terrainal signal sequence 

INTEGRAL Likelihood = -4.30 Transmembrane 128 - 144 ( 127 - 145) 
INTEGRAL Likelihood = -0.11 Transmembrane 227 - 243 ( 227 - 243) 

Final Results 

bacterial membrane Certainty=0 .2720 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB04516 GB:AP001509 glucose kinase [Bacillus halodurans] 
Identities = 97/291 (33%) , Positives = 155/291 (52%) , Gaps = 14/291 (4%) 

LAIDIGGTAIKYGLISETGDLLEKEEMATEAYKGGPSILEKVKGLVKTYQDQMDLAGVAI 64 
+ ID+GGT IK L+S+ G+++ +E TEA +G ++ K+ L + D AG+ I 
VGIDLGGTKIKAALVSDAGEIISVQECPTEAAQGPEEVMNKMMSLTEKVTDHQPFAGIGI 62 



? +++ND N A LAEA+ G 





5 


Sbjct: 


3 




65 


Sbjct: 


63 




125 


Sbjct: 


122 


Query: 


175 


Sbjct: 





LTI TGIGG + + + HG+S A E+G 



TA+ + +G + R +F+Q + GD + + +DYL GIANI + +NP+ 

GTAIGRMARERFG- - -vEGGTREVFDQIRRGDHDMQRLVEEAMDYLAIGIANIAHTINPD 238 
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1 








61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 




Sbjct: 


181 




241 


Sbjct: 


241 



Query: 235 VVVLGGGIMAQKDYLADKLKTiUljDSYLVSSIJ^CKTQLKFASHGNNAGILGA 285 

V VLGGG+M D + +K + YL IA+ T + A G ++G+LGA 
Sbjct: 239 Vi^GGGVMNADDLILPIVKEKVSRYLYPGLAQSTTIVKAPCLGGDSGVLGA 289 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 192/292 (65%) , Positives = 237/292 (80%) 

1IKHGIVDNLGCIVEASEIATEAYKGGPGILQKVCQIIDNYLAEC 
IK+G++ G ++E E+ATEAYKGGP IL+KV ++ Y + 



IFY+GPQIPNYAGTQFKK +E+T+ + E+ENDVNCAGLAEA+S 



GSAKD +ALCLTIGTGIGGCL+ + VFHG S+SACEVGY+HLSDG FQDLASTTAL+ 



GIMAQKDYL DKL +L LV+SLA+KT + FA H N AG+LGAYYHFK + 
GIMAQKDYIiADKLKTALDSYLVSSLAKICTQLKFASHGNNAGILGAYYHFKQK 292 

SEQ ID 922 (GBS331) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 60 (lane 2; MW 35.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 67 (lane 3; MW 61kDa). 

The GBS331-GST fusion product was purified (Figure 209, lane 3) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 309), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 288 

A DNA sequence (GBSx0315) was identified in S.agalactiae <SEQ ID 925> which encodes the amino acid 
sequence <SEQ ID 926>. This protein is predicted to be a acylneuraminate lyase (nanA). Analysis of this 
protein sequence reveals the following: 

io N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0894 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA69950 GB:Y08695 putative acylneuraminate lyase [Clostridium 
tertium] 

Identities = 162/225 (72%) , Positives = 191/225 (84%) 

Query: 1 MKDLQKYQGIIPAFYACYDDKGDICPERvKALTNYFIDKGVQGI)YVNGSSGECIYQSVAD 60 

M++L+KY+GIIPAFYACYDD+G I PER + T Y IDKGV+GLYV GSSGECIYQS + 
Sbjct: 1 MRNLEKYKGIIPAFYACYDDEGKISPERTQMFTQYLIDKGVKGLYVCGSSGECIYQSICEE 60 
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Query: 61 RKLVLENVMSVAKGKLTVIAHVACNlCTKDSVELAJlKaj^IGTOAIAAIPPIYFRLPEYAI 120 

RK+ LENVM VAKGK+T+IAHV CNNT+DS ELS. HAE+IGVDAIA+IPPIYF LP+Y+I 
Sbjct: 61 RKITLENVMKVAKGKITIIAHVGCNNTRDSEELAEHAESIGVI5AIASIPPIYFHLPDYSI 120 

Query: 121 ADYWNTISQAAPQTDFIIYNIPQLAGVALTSDliYRKMLQNPQVIGVKNSSMPVQDIQNFV 180 

A+YWN IS AAP TDFIIYNIPQLAGV L +LY++ML+NP+VIGVKNSSMPVQDIQ F 
Sbjct: 121 AEYWNDISNAAPKTDFI1YNIPQLAGVGLGINLYKQMLKNPRVIGVKNSSMPVQDIQMFK 180 

Query: 181 AIGGENHIVFNGPDEQFLGGRLMGAAAGIGGTYGVMPELYLTLNQ 225 

I G+ +VFNGPDEQF+ GR+MGA GIGGTY VMPEL+L ++ 
Sbjct: 181 DISGDESWFNGPDEQFVAGRIMGADGGIGGTYAVMPELFLAADK 225 

A related DNA sequence was identified in S.pyogenes <SEQ ID 927> which encodes the amino acid 
sequence <SEQ ID 928>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0981 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 238/304 (78%) , Positives = 263/304 (86%) 

Query: 1 MI<DLQKYQGIIPAFYACYDDKGDICPERVKALTNYFIDKGVQGLYVNGSSGECIYQSVAD 60 

M DL KYQGI IPAFYACYDD+G+I PERV+ALT Y+IDKGVQGLY+NGSSGECIYQSV D 
Sbjct: 1 MTDLTKYQGIIPAFYACYDDQGNISPERVRALTQYYIDKGV'QGLYINGSSGECIYQSVFD 60 

Query: 61 RKLVXENVMSVAKGICLTVIMIVACNNTKDSvELAMHAEAIGVDAIAAIPPIYFRLPEYAI 120 

R+LVLENVM+VAKGKLT+ 1 HVACNNTKDS+ELA H+E +GVDAIAAIPPIYFRLPEYA+ 
Sbjct: 61 RQLVXiENVMAVAKGKLTI INHVACNNTKDSIEIAAHSERLGVDAIAAIPPIYFRLPEYAV 120 

Query: 121 ADYWNTISQAAPQTDFI1YNIPQLAGVALTSDLYRKMLQNPQVIGVKNSSMPVQDIQNFV 1B0 

ADYWN IS AAP TDFIIYNIPQLAGVALT LY+ ML N +VIGVKNSSMPVQDIQ F 
Sbjct: 121 ADYWNAISSAAPHTDFIIYNIPQI^GVALTPSLYKTMIJWICRVIGVKNSSMPVQDIQTFC 180 

Query: 181 AIGGENHIVFNGPDEQFLGGRLMGAaAGIGGTYGVMPELYLTLNQLIVDKDLEKARELQF 240 

AIGG++HIVFNGPDEQFLGGRLMGAAAGIGGTYG MPEL+L LNQLI DKDLEKA+ LQ+ 
Sbjct: 181 AIGGDDHIVFNGPDEQFLGGRLMGAAaGIGGTYGAMPELFLRLNQLIADKDLEKAKALQY 240 

Query: 241 TINDIITKLCSGHGNMYAVIKAVLEIMEQLTIGSVRLPIASVTEEDKPIIKEAAEMIRHA 300 

TIN+II L S HGHMY VIK VL INE L IGSVR PLA + EED+ I + AA +1 A 
Sbjct: 241 TINEIIGVLVSAHGNMYGVIKEVLRINEGLDIGSTOSPLAELVEEDRVICQRAAALINQA 300 

Query: 301 KKQF 304 
K+ F 

Sbjct: 301 KETF 304 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 289 

A DNA sequence (GBSx0317) was identified in S.agalactiae <SEQ ID 929> which encodes the amino acid 
sequence <SEQ ID 930>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.45 Transmembrane 82 - 98 ( 79 - 111) 
INTEGRAL Likelihood = -6.85 Transmembrane 24 - 40 ( 21 - 52) 
INTEGRAL Likelihood = -5.26 Transmembrane .180 - 196 ( 172 - 200) 
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INTEGRAL Likelihood = -4.35 Transmembrane 110 - 126 ( 106 - 130) 

Final Results 

bacterial membrane Certainty=0. 4779 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05827 GB:AP001514 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 40/148 (27%) , Positives = 74/148 (49%) , Gaps = 4/148 (2%) 

Query: 14 VNNPFMQGCNVVFDLALLNLLFMI - TCLPLVTIG- - AAKISLYRTLWQKLEGD - QTNLLI 69 

+++ F Q C+ ++ LA +NLL++ T L LV +G A +++ L + G+ + 
Sbjct: 6 MSSRFYQTCDWIWKLAYINLLWLSGTLLGLVVLGFLPATTAMFTVLRKWFTGNPDVAITR 65 

Query: 70 LYIKHLKKEWFQGMLLGLVELSILWIIFDLTILHYQIGFIVSFLKITCYAFLLLTVMTS 129 

+ + K E+ + LLG V L ++ F+ L G + L + YAFL+L ++T 
Sbjct: 66 TFFQAYKNEFLKINLLGAVLLI^YILYFNY^LGTWGTVHMVLSLGWYAFLILYIITL 125 

Query: 130 IYLFPMAARYEMSLL.DTVKKSFIMACLN 157 

Y+ P Y 4- L +K + 1+ +N 
Sbjct: 126 FYIIPAYVHYNLKLFQYIKTALIIGFVN 153 

A related DNA sequence was identified in S. pyogenes <SEQ ID 93 1> which encodes the amino acid 
sequence <SEQ ID 932>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.86 Transmembrane 117 - 133 ( 108 - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 



0 - 46 ( 21 - 

8 - 104 ( 83 - 

26 Transmembrane 165 - 181 ( 151 - 

89 Transmembrane 189 - 205 ( 182 - 2071 



Final Results 

bacterial membrane --- Certainty=0. 6944 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB05582 GB:AP001513 unknown conserved protein in bacilli 
[Bacillus halodurans] 
Identities = 59/194 (30%) , Positives = 93/194 (47%) , Gaps = 11/194 (5%) 

Query: 17 SKWMRASAALFDLLVFNLLFVL-SCLPLLTIGV--AKMALYASLLDWREGQVS-QLVTTY 72 
+K M+ + L+ NLL++L S + + +GV A +L+A W + + L TY 

sbjct: 8 tkimklfewimrlvylnllwllfsf-gg;ilgvmpataslfavfrkwyqkeddfplfqty 67 

Query: 73 SSHFKYYFKSGLRLGLIELGIMTICLLDLFLIRNQSGLVFCGFKVLCVAVLFLWILFLY 132 

+ FK FK +GL +1 I LD+ L+ S + Q + A+ F+ ++ LY 
Sbjct: 68 LNEFKRSFKIANLVGLTLVLIGGILYLDVLLLLGTSHTVIGQLLLMGVGALSFIYLVTLLY 127 

Query: 133 AYPQAVKRDLSLSTLFKRSFLLAGLFFPWSFAFLAFICLTIFSLQL SLLTLFGGVS 188 

+P V DLS FK SFLL G+ P+ LI L++ +L LL LF S 

Sbjct: 128 IFPTLVHFDLSYKQYFKHSFLL-GVLQPFR-TLLLMITLSLSALLFLTFPILLPLF-AAS 184 

Query: 189 LLAIIGISSLTYLY 202 

+A + + S + Y 
Sbjct: 185 FMAALTMWSFLFGY 198 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/210 (32%), Positives = 117/210 (55%) 
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Query: 


3 


KANQLIAAI FDVMNPFMQGCNWFDLALLNLLFMITCLPLVT IGAAKI SLYRTLWQKLEG 


62 






K L+ ++F +++ +M+ +FDIi + NLLF+++CLPL+TIG AK++LY +L EG 




Sbjct: 


4 


KKQGLLHSLFKLDSKWI^SAALFDLLVFNLLFVLSCLPLLTIGVAKMALYASLLDWREG 


63 


Query: 


63 


DQTNLLILYIKHLKKEWFQGMLLGLVELSII1WIIFDLTII1HYQIGFIVSFLKITCYAFL 


122 






+ L+ Y H K + G+ LGL+EL 1+ + + DL ++ Q G + K+ C A L 




Sbj ct : 


64 


QVSQLVTTYSSHFKYYFKSGLRLGLIELGIMTICLLDLFLIRNQSGLVFQGFKVLCTAVL 


123 


Query: 


123 


LLTVMTSIYLFPblAARYEMSLLDTVKKSFIMACIiNLKWTGVLMFLLIMTWFIMVQSSLLF 


182 






L V+ +Y +P A + ++SL K+SF++A L W+ +++TF+SL 




Sbjct: 


124 


FLWILFLYAYPQAVKRDLSLSTLFKRSFLIAGLFFPWSFAFLAFICLTIFSLQLSLLTL 


183 




183 


MLTVSAIFIFAYTAFAYFKIIILQKQFAYF 212 








VS + I ++ Y F 




Sbjct: 


184 


FGGVSLLAI IGISSLTYLYLI IMESLLRRF 213 





A related GBS gene <SEQ ID 8535> and protein <SEQ ID 8536> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score: 3.27 
GvH: Signal Score (-7.5): -4.23 
Possible site: 46 



»> Seems to have an uncleavable N- 


term signal seq 








ALOM program 


count: 5 value: -9.45 threshold: 0.0 








INTEGRAL 


Likelihood = -9.45 


Transmembrane 82 


98 


79 


111 


INTEGRAL 


Likelihood = -6.85 


Transmembrane 24 


40 


21 


52 


INTEGRAL 


Likelihood = -5.26 


Transmembrane 180 


196 


172 


200 


INTEGRAL 


Likelihood = -5.10 


Transmembrane 16 0 


176 


158 


179 


INTEGRAL 


Likelihood = -4.35 


Transmembrane 110 


126 


106 


130 


PERIPHERAL 


Likelihood = 5.89 


142 








modified ALOM 


score: 2.39 











*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4779 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF00072(364 - 828 of 1260) 

EGAD|108353 |BS3003 (14 - 171 of 222) hypothetical protein {Bacillus subtills} 
OMNI |NT01BS3507 conserved hypothetical protein GP| 2635493 | emb | CAB14987 . 1 1 | Z99119 similar to 
hypothetical proteins from B. subtilis {Bacillus subtilis} 

GPl2293197|gb|AAC00275.l| |AF008220 YteU {Bacillus subtilis} PIR|D69991 | D69991 conserved 
hypothetical protein yteU - Bacillus subtilis 
%Match =5.9 

%Identity =26.6 %Similarity = 50.6 

Matches = 42 Mismatches = 74 Conservative Sub.s = 38 

270 300 330 360 390 417 441 471 

IMSKKGY*KC*WRKKYREYIVKKANQLIAAIFDVNN?FMQ3CNWFDIjALIi^ 

I : =1 III- III I I - I « 
MEHDGSLGRMLRFCEWIMRFAYTNLLWLFFTLLGLGVFGIMPATAALFAVMR 
10 20 30 40 50 

498 528 558 588 618 648 678 708 

QKLEG-DQTNLLILYIKHLKKEWFQGMLLGLVELSILWIIFDLTILHYQIGFIVSFLKITCYAFLLLTVMTSIYLFPMA 
• '••I I =1 : = I |:|> III I I 1 = 1 II =: I l» 1= I =1 I 1 = 11 = 

KWIQGQDOTPVLKTFWQEYKGEFFRSNLLGAVLALIGVIIYID)^I-Y?SHFLLHILRFAIMIFGFLFVSMLFYVFPLL 
70 80 90 100 110 120 130 

738 768 798 828 858 888 918 948 

ARYEMSLLDTVKKSFIMACIJ^KWTGVLMFLLIMTWFIMVQSSLLFMLTVSAIFIFAYTAFAYFKIIILQKQFAYFSKQQ 
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|| |:::: |::| :: | : :|:: 
VHFDWKIO^YVI<FSLLLSVAYLQYTLTMIALTOALFFLUaA'LFGIVPF?SVSLISYCHMRIVYAVLLIWEQHGGEPQRKS 



s analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 290 

A DNA sequence (GBSx0318) was identified in S.agalactiae 
sequence <SEQ ID 934>. Analysis of this protein sequence 



<SEQ ID 933> which encodes the amino acid 
reveals the following: 



N- terminal signal 



Final Results 

bacterial cytoplasm Certainty=0. 1827 (Affirmative) < succ; 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MIYDHLLNLTHYKDINPNLDLAIDYLLSHDLRNLDIGTYHISPEVILMVQSNQLSES-FD 59 

MI + L Y +NP+ ID+L L NL G+ I + L++ 
Sbjct: 1 MIITKISRLGTYVGVNPHFATLIDFLEKTGLENLTEGSIAIDGNRLFGNCFTYLADGQAG 60 

Query: 60 HIFEYHKKYLDIHYVIEGHEVIKLGKGDKVEV-EEY--LGDIGFIKCSEETSFDLRDNYI 116 

FE H+KYLDIH V+E E + + + V V +EY DI E LR 

Sbjct: 61 AFFETHQKYLDIHLVLENEEAMAVTSPEWSVTQEYDEEKDIELYTGKVEQLVHLRAGEC 120 

Query: 117 AFFFPEEAHQPNGMGSLGNYVKKGVLKVLMA 147 

FPE+ HQP 4- VKK V KV ++ 

Sbjct: 121 LITFPEDLHQPK-VRINDEPVKKWFKVAIS 150 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 291 

A DNA sequence (GBSx0319) was identified in S.agalactiae <SEQ ID 93 5> which encodes the amino acid 
sequence <SEQ ID 936>. This protein is predicted to be sugar ABC transporter, permease protein (araQ). 
Analysis of this protein sequence reveals the following: 



Possible site: 35 



d have a cleavable N-term signal seq. 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 
Likelihood 
Likelihood 
Likelihood = -2 
Likelihood = -1 



Transmembrane 245 - 



59 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 3 951 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < .< 

- Certainty=0. 0000 (Not Clear) < s 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAD35515 GB:AE001721 sugar ABC transporter, permease protein 
[Thermotoga maritima] 
Identities = 94/262 (35%) , Positives = 158/262 (59%) , Gaps = 1/262 (0%) 



Query: 


15 


LILCLLTVLFIFPFYWIMTGAFKSQPDTI 1 1 PPQWWPKAPTLENFKALTVQNPALRWLWN 74 






+ + + V+F+ P ++ + +FK + PP +PK P+LE + + + IirtJ 


Sbjct: 


9 


IFIVFMLWFMLPVFYAWSSFKPMSEIYSYPPTIFPKXPSLEGYINVIKEYDLLTYLRN 68 


Query: 


75 


SVFISIMTMFLVCCTSSMAGYVLAKKHFYGQKILFSLFIAAMALPKQVVLVPLVRIINFM 134 






++F++ + + S M GY LAK +F+G + + S+F M + QV++VPL +1 + 


Sbjct- 


69 


TLWATVATVITVLVSVMTGYGLAKGKFWGIRPVNSMFTMTMFVSAQVIMVPLFWIRSL 12 8 




135 


GIHDTLWAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEIRTFINVAFPIVKPG 194 






G+ ++LW +I+P V P G+F+ Q+ ++IP ELLESAKIDG E + F + FP+ KP 


Sbjct: 


129 


GLINSLWGLIIPAVYTPTGMFMAVQYMKDIPDELLESAKIDGAKEWQIFWRIVFPLSKPL 188 




195 


FAAIAIFTFINTim)YFMQI,WlLTSRNNLTlSI^VATMQAEM-ATNYGLIMAGAALAAVP 253 






AALAIF+F WND+ + L+++ RN T+ L +AT+Q E + I+A + L +P 


Sbjct: 


189 


VAALAI FS FTWRWNDFVLPLLWNRRNLYTLQLALAT I QEEYGGAEWNTI LAFSTLTI I P 24B 




254 


IVTVFLVFQKSFTQGITMGAVK 275 






+ +FL+FQ+ F +GI G +K 


Sb j Ct : 


249 


TLIIFLLFQRLFMKGIMAGGLK 270 



25 A related DNA sequence was identified in S.pyogenes <SEQ ID 937> which encodes the amino acid 
sequence <SEQ ID 938>. Analysis of this protein sequence reveals the following: 



Possible site: 40 



30 





have a cleavable N-term signal seq. 










INTEGRAL 


Likelihood = -6.37 Transmembrane 


245 


261 


240 


265) 


INTEGRAL 


Likelihood = -5.15 Transmembrane 


140 


156 


139 


158) 


INTEGRAL 


Likelihood = -2.97 Transmembrane 


111 




107 


128) 




Likelihood = -2.87 Transmembrane 


76 


92 


75 


93) 


INTEGRAL 


Likelihood - -1.59 Transmembrane 


188 


204 


186 


204) 



35 , Final Results 

bacterial membrane Certainty=0 . 3548 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- certainty=0. 0000 (Not Clear) < suco 



40 The protein has homology with the following sequences in the databases: 

?GP:CAB59597 GB-.AL132662 probable sugar transport inner membrane 
protein [Streptomyces coelicolor A3 (2) ] 
Identities = 88/262 (33%) , Positives = 147/262 (55%) 

45 Query: 15 VMLCVLTILFIFPFYWIMTGAFKAQADTIMIPPQWWP:<APTIENFKALWQNPALKWLWN 74 

++L L ++F P W++ + + A+ PP WP + ++ ++ +W N 

Sbjct: 38 LLIiAPLALVFAVPLVWLVLSSVMSNAEINRFPPALWPSGIDLGGYRYVLGNAMFPRWFVN 97 

Query: 75 SVFISVATMFLVCGTSSLAGYAIiUCKRFYGQRLLFSIFIAAMALPKQVVLVPLVRIVNFM 134 
50 S+ +S T+ SLAGYA A+ RF G R+L + +A MA+P Q+ ++P ++ + 

Sbjct: 98 SLIVSAVTVAANLVFGSLAGYAFARMRFAGSRVLMGLMIiATMAVPFQLTMIPTFLVMKKL 157 

Query: 135 GIHDTLAAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEIRTFFNVAFPIVKPG 194 
G+ DTL A+I+P + PF VFL++QF ++P EL E+A IDGC +R + + P+ +P 
55 Sbjct: 158 GLIDTLGALIVPSLVTPFAVFLLRQFFLSLPREL3EAAWIDGCSRLRVLWRIVLPLSRPA 217 

Query: 195 FAAIiAIFTFINTraTOYFMQLVMLTSRENLTISLGVATMQAEMATNYGLIMAGAAMAAVPI 254 

A +A+ TF+ TWND L+ + T+ LG+ T Q + T + +MAG + +P+ 

Sbjct: 218 lATVAVLTFLTTmTOLTWPLIAINHDTQYTLQLGLTTFQ^QHHTQWAAVmGWITVLPV 277 

60 

Query: 255 VTVFLVFQKSFTQGITMGAVKG 276 

+ FL QK+F Q IT +KG 
Sbjct: 278 LLAFLGAQKTFIQSITSSGLKG 299 



65 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 245/276 (88%) , Positives = 262/276 (94%) 

Query: 1 MKKKTPSAYNFLTALILCLLTVLFIFPPYWIMTG?.FKSQPDTIIIPPQWWPKAPTLENFK 60 

M KK +A + LT ++LC+LT+LFIFPFYWIMTGAFK+Q DTI+IPPQWWPKAPT+ENFK 
Sbjct: 1 MTKKKLTASDILTTVMLCVLTILFIFPFYt'JIMTGAFKRQADTIMIPPQtWPKaPTIENFK SO 

Query: 61 ALWQlffALRWLVmSVFISIMTMFLVCCTSSMAGYVLAKKRFYGQKILFSLFIAAMALPK 120 

AL VQNPAL+WLWNSVFIS+ TMFLVC TSS+AGY IAKKRFYGQ++LFS+FIAAMALPK 
Sbjct: 61 ALWQNPALKWLWSVFISVATMFLVCGTSSIAGYALAKKRFYGQRLLFSIFIAAMALPK 120 

Query: 121 QWLVPLWIINFMGIHDTLWAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEI 180 

QWLVPLVRI +NFMGIHDTL AVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEI 
Sbjct: 121 QWLVPLVRIVNFMGIHDTLAAVILPLVGWPFGVFLMKQFSENIPTELLESAKIDGCGEI 180 

Query; 181 RTFINVAFPIVKPGFAALAI FTFIKTWNDYFMQLVMLTSRNNLTI SLGVATMQAEMATNY 240 

RTF NVAFPI VKPGFAALAI FTFINTWNDYFMQLVMLTSR NLTI SLGVATMQAEMATNY 
Sbjct: 181 RTFF]WAFPIVKPGFAAIAIFTFI^^IWDYFMQLVMLTSRENLTISLGVATMQAEMATNY 240 

Query: 241 GLIMAGAALAAVPIVTVFLVFQKSFTQGITMGAVKG 276 

GLIMAGAA+AAVPIVTVFLVFQKSFTQGITMGAVKG 
Sbjct: 241 GLIMAGAAMAAVPIVTVFLVFQKSFTQGITMGAVKG 276 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 292 

A DNA sequence (GBSx0320) was identified in S.agalactiae <SEQ ID 939> which encodes the amino acid 
sequence <SEQ ID 940>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.83 Transmembrane 74 - 90 ( 64 - 96) 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -0 



Transmembrane 108 - 124 ( 107 - 126) 

0 - 286 ( 265 - 290) 

1 - 177 ( 156 - 182) 
Transmembrane 219 - 235 ( 219 - 235) 



Final Results 

bacterial membrane — Certainty=0 . 5331 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05584 GB:AP001513 sugar transport system (permease) (binding 
protein dependent transporter) [Bacillus halodurans] 
Identities = 106/289 (36%) , Positives = 168/289 (57%) , Gaps = 6/289 (2%) 

Query: 9 RETMIAYAFLAPILLFFLIWFAP^GFVTSFFNYSM-TQFTFIGLANYNRMF-HDSIF 66 

+E Y F+AP ++ F IF PM+ SF ++ + + + G NY R+F D +F 

Sbjct: 25 KEYFWGYLFIAPPIIGFAIFALGPMLYSIYVSFTDFDLYNEPVWTGADNYYRLFVTDDLF 84 

Query: 67 MKSLI^ITVIIVIGSVPWVFFSLFVAAOTYEKNVFSRSFYROTFFLPVOTGSVAVTVVWK 126 

K++ NT +G +P+ + SL +A +K V + +R FFLP V+ VA+T++W+ 
Sbjct: 85 RKTVFNTFYAALG-IPIGMAVSLGIAVALNQK-VKGIALFRTAFFLPAVSSWAITLLWR 142 

Query: 127 WIYDPMSGII^NYILKSG^IEQNISlCfiDKHWAIJjAIIIILLTTSVGQPIILYIAAMGNI 186 

WI++ G+LN +L +V WL D+ WA+ A+II + +G +ILY+AA+ + 

Sbjct: 143 WIFNADFGLLNIMLN- -YVGIHGPGWLSDEKWAMPAMI IQGVWGGLGINMILYLAALQGV 200 

Query: 187 DNSLCEAARVTKSAlffiMQVFWQIKWSLLPTTLYIAVITTINSFQCFALIQLLTSGGPNYS 246 

+ +L EAA +DG N Q F I PS+ PTT +1 + +TI + Q F ++T GGPNYS 
Sbjct: 201 NPALYEAADIDGGNAWQKFIHITvPSISPTTFFILITSTIGALQDFQRFMIMTEGGPNYS 260 

Query: 247 TSTLMYYLYEKAFKLSEYGYANTMGVFLAW1IALISFAQFKILGNDVEY 295 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 94 1> which encodes the amino acid 
sequence <SEQ ID 942>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.74 Transmembrane 55 - 71 ( 44 - 78) 

INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -6 
INTEGRAL Likelihood = -5 
INTEGRAL Likelihood = -0. 



109 - 125 { 98 - 130) 

304 - 320 ( 299 - 324) 

142 - 158 ( 141 - 160) 

Transmembrane 196 - 212 ( 190 - 216) 

3 - 269 ( 253 - 269) 



Final Results 

bacterial membrane Certainty=0. 6095 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB05584 GB:AP001513 sugar transport system (permease) (binding 
protein dependent transporter) [Bacillus halodurans] 
Identities = 113/310 (36%) , Positives = 176/310 (56%) , Gaps = 9/310 (2%) 

Query: 25 KVEQKKEVFQVNVNKLKMR---ETLISYAFLAPVLVFFVIFvLIPMIMGFVTSFFNYSM- 80 

+VE +E K K R E Y F+AP ++ F IF L PM+ SF ++ + 

Sbjct: 4 EVETPRETKTTKARKQKRRLNKEYFWGYLFI APPI IGFA1 FALGPMLYS I YVSFTDFDLY 63 

Query: 81 TEFTFVGFANYARMF - QDP I FMKSLHJTLI I VIGSVPWVFFSLFVAAKTYDKNWARSF 139 

E + G NY R+F D +F K++ NT +G +P+ + SL +A K V + 

Sbjct: 64 NEPVWTGADNYYRLFWDDLFRKTVFOTFYAALG-IPIGMAVSLGIAVALNQK-VKGIAL 121 

Query: 140 YRAVFFLPVVTGSVAVTVVWKWIYDPMSGIIJ^YVLKYAHV-IEQNISWLGDBaiWAIiLAIIV 199 

+R FFLP V+ VA+T++W+WI++ G+LN +L Y + WL D+ WA+ A+I + 

Sbjct: 122 FRTAFFLPAVSSWAITLLWRWIFNADFGLLNIMLNYVGI--HGPGWLSDEKWAMPAMII 179 

Query: 200 ILLTTSVGQPIILYIAAMGNIDNSLVEAARVDGATEFQVFWNIKWPSLLPTTLYIAVITT 259 

+ +G +ILY+AA+ ++ +L EAA +DG +Q F +1 PS+ PTT +1 + +T 
Sbjct: 180 QGVWGGLGMILYLAALQGTOPALYFAADIDGGNAWQKFIHITVPSISPTTFFILITST 239 

Query: 260 INSFQCFALIQLLTSGGPNYSTSTLMYYLYEKAFKLSEYGYANTMGVFLAVMIAIISFAQ 319 

I + Q F ++T GGPNYST+T+ + YYL+ AF+ E GYA+ M L ++I 11+ 
Sbjct: 240 IGALQDFQRFMIMTEGGPNYSTTT7VYYLFIMAFRYMEMGYASAMAWVLGIIILIITIIN 299 

Query: 320 FKILGNDVEY 329 

FK+ V Y 
Sbjct: 300 FKLAKKWVHY 309 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 263/295 (89%), Positives =278/295 (94%) 

Query: 1 MRTNKLKMRETMIAYAFLAPILLFFLIFVFAPMVMGFVTSFFNySMTQFTFIGLANYNRM 60 

+ NKLKMRET+I+YAFLAP+L+FF+IFV PM+MGFVTSFFNYSMT+FTF+G ANY RM 
Sbjct: 35 VNTOKLKMRETLISYAFLAPVLVFFVIFVLIPMIMGFVTSFFNYSMTEFTFVGFANYARM 94 

Query: 61 FHDSIFMKSLINTVIIVIGSVPVWFFSLFVAANTYEKNVFSRSFYRCVFFLPVVTGSVA 120 

F D IFMKSLINT+IIVIGSVPWVFFSLFVAA TY+KNV +RSFYR VFFLPWTGSVA 
Sbjct: 95 FQDPIFMKSLINTLIIVIGSVPVVVFFSLFWAAKTYDKNWARSFYRAVFFLPVVTGSVA 154 

Query: 121 VTWWKWIYDPMSGILNYILKSGHVIEQNISVILGDKHWALLAIIIILLTTSVGQPIILYI 180 

VTWWKWIYDPMSGILNY+LK HVIEQNISWLGDKHWALIAII+ILLTTSVGQPIILYI 
Sbjct: 155 VTVVWKWIYDPMSGILNYVLKYAWIEQNISWI^KHWALIAIIVILLTTSVGQPIILYI 214 

Query: 181 AAMGNIDNSLCEAARVDGANEMQVFWQ1KWPSLLPTTLYIAVITTINSFQCFALIQLLTS 240 
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Query: 241 GGPNYSTSTLMYYLYEKAFICLSEYC-YAOTHC-VFLAWIAL.ISFAQFKILGNDVEY 295 

GGPIWSTSTLMYYLYEKAFKLSEYGYAISriMGVFLAVMIA+ISFAQFKILGKDVEY 
Sbjct: 275 GGPNYSTSTLMYYLYEKAFKLSEYGYANTMGVFLAVMIAIISFAQFKILGNDVEY 329 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 293 

A DNA sequence (GBSx0321) was identified in S.agalactiae <SEQ ID 943> which encodes the amino acid 
sequence <SEQ ID 944>. Analysis of this protein sequence reveals the following: 

ti signal seq. 

Final Results 

bacterial outside — Certainty=0 . 3000 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

jar-binding protein [Bacillus subtilis] 
! = 90/187 (47%) , Gaps = 14/187 (7%) 

Query: 19 MFACVDSSQSVMAAEKD - KVEITWWAFPTFTQEKAKDGVGTYEKKVI KAFEKKNPNI KVK 77 

MF+ + + ++D + I WW + D Y KVI+ +EKKNP++ ++ 
Sbjct: 1 MFSGCSAGEEASGKKEDVTLRIAWWG GQPRHD YTTKVIELYEKKNPHVHIE 51 

Query: 78 LETIDFTSGPEKITTAIEAGTAPDVLFDAPGR1 IQYGKNGKLADLNDLFTDQFIKDVN- - 135 

E ++ +K+ AG PDV+ + QYGK +L DL D I DV+ 

Sbjct: 52 AEFANWDDYWKKLAPMSAAGQLPDVI QMDTAYLAQYGKKNQLEDLTPYTKDGT I - DVSS 1 110 

Query: 136 NKNI IQASKSGDKAYMYPI SSAPFYMAFNKTOILKDAGVIjKLVKEGWTTSDFEKVLKMjKN 195 

++N++ K +K Y + + + N+ +LK AGV + +E WT D+EK+ L+ 

Sbjct: 111 DENMLSGGKIDNK1YGFTLGVNVLSVIANEDLLKKAGV-SINQENWTWEDYEKLAYDLQE 169 

Query: 196 KGYTPGS 202 

K GS 
Sbjct: 170 KAGVYGS 176 

A related DNA sequence was identified in S.pyogenes <SEQ ID 945> which encodes the amino acid 
sequence <SEQ ID 946>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> May be a lipoprotein 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

!GB:Z99107 similar to sugar-binding protein [Bacillu. . . 82 2e-14 

>GP:CAB12516 GB:Z99107 similar to sugar-binding protein [Bacillus subtilis] 
Identities = 105/446 (23%) , Positives = 176/446 (38%) , Gaps = 71/446 (15%) 



Query: 24 GKSQKEAGASKSDTAKTEITWWAFFVFTQEKAECGVGTYEKKLIAAFEKANPEIKVKLET 83 
G S E + K + I WW + D Y K+I +EK NP + ++ E 
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Sbjct: 4 GCSAGEEASGKKEDVTLRIAWWG GQPRHD YTTKVIELYEKKNPHVHIEAEF 54 

Query: 84 IDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNGKLADLNDLFTEEFTKDVN--NDK 141 

++ +K+ AG PDV+ + QYGK +L DL +T++ T DV+ ++ 

Sbjct: 55 ANTODYWKKIAPMSAAGQLPDVIQMDTAYLAQYGKKNQLEDLTP-yTKDGTIDVSSIDEN 113 

Query: 142 LIQASKAGDTAYMYPISSAPFYMAI^KKMLKDAGVLDLVKEGWTTDDFEKVLKALKDK-- 199 

++ K + Y + + + N+ +LK AGV + +E WT +D+EK+ L++K 

Sbjct: 114 MLSGGKIDNKLYGFTLGVNVLSVIANEDLLKKAGV- S INQENWTWEDYEKLAYDLQEKAG 172 

Query: 200 GYNPGSFFANGQGGDQGPRAFFANLYSSHITDDKV TKYTT 239 

G +P F +G R + + DD++ T T 

Sbjct: 173 VYGSNGMHPPDIFFPYYLRTKGERFYKEDGTGLAYQDDQLFVDYFERQLRLVKAKTSPTP 232 

Query: 240 DDANS1KAMTKISNWIKDGLMMNGSQYDGSADIQNFANGQTSFTILWAPAQPGIQAKLLE 299 

D++ IK M +D ++ G SA N++W F A+L + 

Sbjct: 233 DESAQIKGM EDDFIVKGK SAI TWNYSNQYLGF ARLTD 269 

Query: 300 ASKVDYLEIPFPSDDGKPELEYLWGFAVFNNKDEQKVAASKTFIQFIADDKEWGPKNW 359 

+ YLP+L+ EK A+K FI F +++E + + 

Sbjct: 270 SPLSLYLP PEQMQEKALTLKPSMLFSIPKSSEHKKEAAK-FINFFVMNEE-ANQLIK 324 

Query: 360 RTGAFPVRTSYGDLYKDKRMEK IAEWTKFYSPYYNTID GFAEMRTLWFPMVQ 411 

PV DKKE4- IE++S + D G AE+ L 

Sbjct: 325 GERGVPVSDIOTADAIKPKLNEEETNIVEYVETASKNISKADPPEPVGSAEVIKLLKDTSD 384 

Query: 412 AVSNGDEKPEDALKAFTEKANKTIKK 437 

+ PE A K F +KAN4- +++ 

Sbjct: 385 QI LYQKVS PEKAAKTFRKKANEI LER 410 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 352/438 (80%), Positives = 384/438 (87%), Gaps = 4/43B (0%) 

Query: 1 MSIKKSVIGFCLGAAALSMFACVDSSQSVMAAEKD KVEITWWAFPTFTQEKAKDGVG 57 

M++KK LGA+ L + AC SQ A K K EITWWAFP FTQEKA+DGVG 

Sbjct: 1 I»lNMKE01iASIiAMLGASVLGLAACGGKSQKFAGASKSDTAKTEITWWAFPVFTQEKAEDGVG 60 

Query: 58 TYEKKVIKAFEKKNPNIKVKLETIDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNG 117 

TYEKK+I AFEK NP IKVKLETIDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNG 
Sbjct: 61 TYEKKLIAAFEKANPEIKVKLETIDFTSGPEKITTAIEAGTAPDVLFDAPGRIIQYGKNG 120 

Query: 118 KLADRTOLFTDQFIKDVIMKNIIQASKSGDKAY^PISSA 177 

KLADIiNDLFT++F KDVNN +IQASK+GD AYMYPISSAPFYMA NKKMLKDAGVL LV 
Sbjct: 121 KLADUTOLFTEEFTKDVNM3KLIQASKAGDTAY^PISSAPFYMALNKKMLKD 180 

Query: 178 KEGWTTSDFEKVLKALKNKGYTPGSFFANGQGGDQGPRAFFANLYSAPITDKEVTKYTTD 237 

KEGWTT DFEKVLKALK+KGY PGSFFANGQGGDQGPRAFFANLYS+ ITD +VTKYTTD 
Sbjct: 181 KEGWTTDDFEKVLKALKDKG™pGSFFAKGQGGDQSPRAFFAI>ILYSSHITDDKVTKYTTD 240 

Query: 238 TKNSVKSMKKIVEWIKKGYLMNGSQYDGSADIQNFANGQTAFTILWAPAQPKTQAKLLES 297 

KS+K+M KI WIK G +MNGSQYDGSADIQKFAIIGQT+FTILWAPAQP QAKLLE+ 
Sbjct: 241 DANSIKAMTKISNWIKDGLMMNGSQYDGSADIQNFANGQTSFTILWAPAQPGIQAKLLEA 300 

Query: 298 SKVDYLEVPFPSEDGKPDLEYLVNGFAVFNNKDENKVKASKKFITFIADDKKWGPKDVIR 357 

SKVDYLE+PFPS+DGKP+LEYLWGFAVFNNKDE KV ASK FI FIADDK+WGPK+V+R 
Sbjct: 301 SKVDYLEIPFPSDDGKPELEYLWGFAVFNNKDEQKVAASKTFIQFIADDKEWGPKNVVR 360 

Query: 358 TGAFPVRTSFGDLYKGDKOTWKISKKTQYYSPYYNTIDGFSEMRTLWFPMVQSVSNGDEK 417 

TGAFPVRTS+GDLYK DKRM KH-1-WT++YSPYY1JTIDGF+EMRTLWFPMVQ4-VSNGDEK 
Sbjct: 361 TGAFPVRTSYGDLYK-DKRMEKIAEWTKFYSPYYNTIDGFAEMRTLWFPMVQAVSNGDEK 419 

Query: 418 PADALKDFTQKANDTIKK 435 

P DALK FT+KAN TIKK 
Sbjct: 420 PEDALKAFTEKANKTIKK 437 
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A related GBS gene <SEQ ID 8537> and protein <SEQ ID 8538> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 5.05 
GvH: Signal Score (-7.5): 4.69 

Possible site: 31 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 7.69 threshold: 0.0 
PERIPHERAL Likelihood =7.69 90 
modified ALOM score: -2.04 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) <; suco 

The protein has homology with the following sequences in the databases: 

28.8/48.4% over 409aa 

Bacillus subtilis 

EGAD | 107689 | hypothetical protein Insert characterized 

GPj2633010jemb|CAB12516.l| ]Z99107 similar to sugar -binding protein Insert characterized 
PIR|F69796|F69796 sugar-binding protein homolog yesO - Insert characterized 

ORF01146(355 - 1605 Of 1914) 

EGAD|107689|BS0697(1 - 410 of 412) hypothetical protein {Bacillus 

subtilis}GP|2633010|emb[CAB12516.l| |Z99107 similar to sugar-bindin 

g protein {Bacillus subtilis}PIR| F69796 | F69796 sugar-binding protein homolog yesO - 
Bacillus subtilis 
%Match =5.4 

%Identity =28.8 %Similarity =48.3 

Matches = 69 Mismatches = 116 Conservative Sub.s = 47 

318 348 378 435 465 495 525 

RGIVMSIKKSVIGFCLGAAALSMFACVDSSQSVMAfiEim- 

11= ::| = I II I |||: :||llh = 

MFSGCSAGEEASGKKEDVTLRIAWW GGQPRHDYTTKVIELYEKKNPHVH 



VKLETIDFTSGPEKITTAIEAGTAPDvLFDAPGRIIQYGKNGKIADI^LFTDQFIIO)VN-NICNIIQASKSGDI<AYMYPI 
= = I = = =1= II llh : llll =1 || I | : ::):= | :| | : : 

IFAEFANWDDYWKKLAPMSAAGQLPDVIQTOTAYIAQYGKKNQLEDLTEYTKDGTIDVSSIDENMLSGGKIDNKLYGFTL 



SSAPFYMAFNKKMLKDAGVLKLVI^GWTTSDFEKVLKALKNKGYTPGSFFANGQGGDQGPRAFFANLYSA 

= = 1= =11 III = =1 II 1=11= hi I = :|| : | || 
GVNVLSVIANEDLLKKAGV- SINQENWTWEDYEKLAYDLQEK- - -AGVYGSNGM- - - HPPDI FFPYYLRTKGERFYKEDG 
140 150 160 170 180 190 200 

990 1020 1050 1080 

PITDKEVTKYTTDTKNSVKSMKKIVEWIKKGYLMNGSQYDGSA 

h III || : = | 

TGLAYQDDQL NIVEYVETASKNISKADPPEPVGSAEVIKLLKDTSDQILYQKV --- 

350 360 370 380 390 

1515 1545 1575 1605 1635 1665 1695 1725 

FSEMRTLWFPMVOJSVSNGDEKPADMiKD^ 

llll =111= === 
SPEKAAKTFRKKANEILERNN 
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SEQ ID 944 (GBS16) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 3 (lane 9; MW 49kDa). 

The GBS16-His fusion product was purified (Figure 92A; see also Figure 189, lane 9) and used to immunise 
mice (lane 1 + 2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 92B), 
FACS (Figure 92C ), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 294 

A DNA sequence (GBSx0322) was identified in S.agalactiae <SEQ ID 947> which encodes the amino acid 
sequence <SEQ ID 948>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 



le Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9459> which encodes amino acid sequence <SEQ ID 9460> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC66999 GB:AE001166 conserved hypothetical protein [Borrelia 
burgdorferi] 

Identities = 107/225 (47%) , Positives = 147/225 (64%) , Gaps = 6/225 (2%) 



v FIT TM E+D+L + + +IALD T R R DG+ 4 



t EGKI TP +A + +GV +WGGAITRP EI ++F+ ++ 
EVEGKIDTPLKAQKCFEMGVDLWVGGAITRPJiEITKKFVEKIN 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 949> which encodes the amino acid 
sequence <SEQ ID 950>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 175 - 191 ( 175 - 192) 

Final Results 

bacterial membrane Certainty=0 . 1595 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 





12 


Sbjct: 


6 


Query: 


72 


Sb j ct : 


64 






Sb j ct : 


124 




188 


Sbjct: 


184 
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The protein has homology with the following sequences in the databases: 

>GP:AAD28762 GB:AF130859 putative N-acetylmannosamine-6-P epimerase 
[Clostridium perfringens] 
Identities = 113/225 (50%) , Positives = 148/225 (65%) , Gaps = 5/225 (2%) 

Query: 10 LMEQDKGGIIVSCQALPGEPIiYSETGGIMPLMRKAAQEAGAVGIRANSVEDIKEIQAITD 69 

+++ +KG + IVSCQAL EPL+S IM MA AA++ GA IRA + DI EI+ +T 
Sbjct: 1 MLDWKGNLIVSCQALSDEPLHSSF--IMGRMAIAftKQGGAAAIRAQGIDDINEIKEVTK 58 

Query: 70 LPIIGIIiQCDYPPQEPFITATMTEVDQIiAAIiNIAVIAMDCTKRDRHDGIiDIASFIEQVKE 129 

LPIIGIIK++Y E +IT TM EVD+L + +1 +D TKR R +G +1 + + 
Sbjct: 59 LPIIGIIKRNYDDSEiyiTPTMXEVDELLKTDCEMIGLDATKRKRPNGENIKDLVDAIHA 118 

Query: 130 KYPNQLLMADISTFDEGLVAHQAGIDFVGTTLSGYTPY3RQHAGPDVALIFALCK-AGIA 188 

K +L MADIST +EG4 A + G D V TTLSGYTPYS+Q D L+E L K I 
Sbjct: 119 K--GRLAMADISTLEEGIEAEKLGFDCVSTTLSGYTPYSKQSNSVDFELLEELVKTVKIP 176 

Query: 189 VIAEGKIHSPEEAKKINDLGVAGIWGGAITRPKEIAERFIEALK 233 

VI EG+I++PEE KK DLG WGGAITRP++I +RF + LK 
Sbjct: 177 VICEGRINTPEELKKALDLGAYSAWGGAITRFQQITKRFTDILK 221 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 172/227 (75%) , Positives = 202/227 (88%) 

Query: 5 SKEAFKKQIKNGIIVSCQALPGEPLYTESGGVMPLLALAAQEAGAVGIRANSVRDIKEIQ 64 

+KE +Q+K GIIVSCQALPGEPLY+E+GG+MPL+A AAQEAGAVGIRANSVRDIKEIQ 
Sbjct: 6 TKEKLMEQLKGGIIVSCQALPGEPLYSETGGIMPLMAKAAQEAGAVGIRANSWDIKEIQ 65 

Query: 65 EVTNLPI IGI IKREYPPQEPFITATMTEVDQIASLDIAVIALDCTLRERHDGLSVVEFIQ 124 

+T+LPIIGIIK++YPPQEPFITATMTEVDQLA+L+IAVIA+DCT R+RHDGL + FI + 
Sbjct: 66 AITDLPI IGI IKKDYPPQEPFITATMTEVDQLAALNIAVIAMDCTKRDRHDGLDIASFIR 125 

Query: 125 KIKRKYPEQLLMADISTFEEGKNAFE^VDFVGTTLSGYTDYSRQEEGPDIELtNKLCQA 184 

++K KYP QLLMADISTF+EG A +AG+DFVGTTLSGYT YSRQE GPD+ L+ LC+A 
Sbjct: 126 QVKEKYPNQLLMADISTFDEGLVAHQAGIDFVGTTLSGYTPYSRQEAGPDVALIEALCKA 185 

Query: 185 GIDVIAEGKIHTPKQANEINHIGVAGIWGGAITRPKEIAERFISGL 231 

GI VIAEGKIH+P++A +IN +GVAGIWGGAITRPKEIAERFI h 
Sbjct: 186 GIAVIAEGKIHSPEEAKKINDLGVAGIWGGAITRPKEIAERFIEAL 232 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 295 

A DNA sequence (GBSx0323) was identified in S.agalactiae <SEQ ID 951 > which encodes the amino acid 
sequence <SEQ ID 952>. This protein is predicted to be group B streptococcal surface immunogenic 
protein. Analysis of this protein sequence reveals the following: 

n signal seq. 

Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 953> which encodes the amino acid 
sequence <SEQ ID 954>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
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»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3 000 (Affirmative) < succs 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 182/437 (41%) , Positives = 240/437 (54%) , Gaps = E 



Query: 1 MKtWKKVLLTSTMAASLLSVASVQAQETDTTlTOARWSEVKADLVKQDNKSSYTVKYGDT 60 

M + KK L +++A SL+ +A+ QAQE WT R+V+E+fO+LV DN +YTVKYGDT 

Sbjct: 1 MIITKKSLFVTSVALSLVPLATAQAQE OTPRSVTEIKBELVLVDNVFTYTVKYGDT 56 

Query: 61 LSVISEAMSIDM1WIAKINNIADINLIYPETTLTVTYDQKSHTATSMKIETPATNAAGQT 120 

LS I+EAM ID++VL IN+IA+I+M+P+T LT Y+Q AT++ ++ PA++ A + 

Sbjct: 57 LSTIAEAMGIDVHVLGDINHIANIDLIFPETILTANYNQKGQ-ATNLTVQAPASSPASVS 115 



Query: 121 TATVDLKTNQVSVADQKVSLNTISEGMTP-EAAT7IVSPMKTYSSAPALKSKEVLAQEQA 179 

Q S Q ++ TP + TT + K SS A S E+ + 

Sbjct: 116 HVPSSEPLPQASATSQPTV- -PMAPPATPSDVPTTPFASAKPDSSVTA- -SSELTSSTND 171 



Query: 180 VSQAAANEQVSPAPWSITSEVPAAKEEVKPTQTSVSQSTTVSPASVAAETPAPVAKVAP 239 
VS ++E V PAE TVT+SA+APP + 

25 Sbjct: 172 VSTELSSESQKQPEVPQEAVPTPKAAE TTEVEPKTDISEAPTSANRPVPNESASE 225 



Query: 240 VRTVAAPRVASVKWTPKVETGASPEHVSAPAVP VTTTSPATDSKLQATEVKSVPVA 296 

+ AAP + A E SAPA TTS AT + L 

Sbjct: 227 EVSSAAP AQAPAEKEETSAPAAQKAVADTTSVATSNGL 264 

Query: 297 QKAPTATPVAQPASTTNAVAAHPENAGLQPHVAAYKEKVASTYGVNEFSTYRAGDPGDHG 356 

AP A +P NAGLQP AA+KE+VAS +G+ FS YR GDPGDHG 

Sbjct: 265 SYAPNH AYNPMNAGLQPQTAAFKEEVASAFGITSFSGYRPGDPGDHG 311 

Query: 357 KGIAVDFIVGTNQALGNKVAQYSTQNMAANNISYVIWQQKFYSNTNSIYGPANTWNAMPD 416 

KGLA+DF+V N ALG++VAQY+ +MA ISYVIW+Q+FY+ SIYGPA TWN MPD 

Sbjct: 312 KGLAIDFMVPENSALGDQVAQYAIDHMAERGISYVIWKQRFYAPFASIYGPAYTWNPMPD 371 



Query: 417 RGGVTANHYDHVHVSFN 433 

RG +T NHYDHVHVSFN 
Sbjct: 372 RGSITENHYDHVHVSFN 388 



A related GBS gene <SEQ ID 8539> and protein <SEQ ID 8540> were also identified. Analysis of this 
protein sequence reveals the following: 

45 Lipop: Possible site: -1 Crend: 3 

SRCFLG: 0 

McG: Length of 13R: 20 

Peak Value of UR: 1.96 
Net Charge of CR: 2 
50 McG: Discrim Score: 2.95 

GvH: Signal Score (-7.5): 3.84 

Possible site: 23 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 24 
55 ALOM program count: 0 value: 4.29 threshold: 0.0 

PERIPHERAL Likelihood =4.29 58 
modified ALOM score: -1.36 



*** Reasoning Step: 3 
Rule gpol 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < 

bacterial membrane Certainty=0. 0000 (Not Clear) < s 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

SEQ ID 8540 (GBS322) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 77 (lane 9; MW 52kDa). The GBS322-His fusion product was purified (Figure 
214, lane 10) and used to immunise mice. The resulting antiserum was used for FACS (Figure 267), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on mis analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 296 

A DNA sequence (GBSx0324) was identified in S.agalactiae <SEQ ID 955> which encodes the amino acid 
sequence <SEQ ID 956>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.86 Transmexbrane 5 - 21 ( 4-21) 

Final Results 

bacterial membrane --- Certainty=0. 1744 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0- 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC4S072 GB:U50357 zoocin A endopeptidase [Streptococcus 
zooepidemicus] 

Identities = 163/274 (59%) , Positives = 196/274 (71%) , Gaps = 11/274 (4%) 

Query: 25 VIiADTYWPIDNGRITTGFNGYPGHCGvnYAVCTGTIIRAVADGTVKFAGAGANFSWMTD 84 

V A TY RP+D G ITTGFNGYPGH GVDYAVP GT +RA.VA+GTVKFAG GAN WM 
Sbjct: 21 VSAATYTRPLDTGNITTGFNGYPGHVGVDYAVPVGTPVRAVANGTVKFAGNGANUPWMLW 80 

Query: 85 IAGNCVMIQHADGMHSGYAHMSRWARTGEKVKQGDIIGYVGATGMATGPHLHFEFLPAN 144 

+AGNCV+IQHADGMH+GYAH+S++ T VKQG IIGY GATG TGPHLHFE LPAN 
Sbjct: 81 MAGNCVLIQHADGMHTGYAHLSKISVSTDSTVKCGQIIGYTGATGQVTGPHLHFEMLPAN 140 

Query. 145 PNFQNGFHGRINPTSLIANVATFSGKTQASAPSIKPLQSAPVQNQSSKLKVYRVDELQKV 204 

PN+QNGF GRI+PT IAN F+G T + P N LK+Y+VD+LQK+ 
Sbjct: 141 PNWQNGFSGRIDPTGYIANAPVFNGTTPTE PTTPTTN LKIYKVDDLQKI 189 

Query: 205 NGVWLVXNNTLTPTGFDWNDNGIPASEIDE^ 264 

NG+W V+NN L PT F W DNGI A ++ EV +NG T+DQVLQKGGYF+ NP +K+V 
Sbjct: 190 NGIWQVRNNILVPTDFTOIVDNGIAADDVIE\'TSNGTRTSDQVLQKGGYFVINPNNVKSVG 249 

Query: 265 KPIQGTAGLTWAKTRFANGSSVWLRVDNSQELLY 298 

P++G+ GL+WA+ F G +VWL + LLY 
Sbjct: 250 TPMKGSGGLSWAQVNFTTGGNVWLNTTSKDNIjLY 283 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8541> and protein <SEQ ID 8542> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 6.63 
GvH: Signal Score (-7.5): -2.97 

Possible site: 23 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -1.86 threshold: 0.0 

INTEGRAL Likelihood = -1.86 Transmembrane 5 - 21 ( 4-21) 

PERIPHERAL Likelihood =5.57 50 
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modif ied MjOM score: 0.87 

*** Reasoning Step: 3 

5 Final Results 

bacterial membrane Certainty=0 . 1744 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

GP|280435l|gb|AAC46072.l| |U50357(21 - 283 of 285) zoocin A endopeptidase {Streptococcus 
zooepidemicus } 
%Match =34.2 

%ldentity = 61.3 %Similarity =74.4 
15 Matches = 163 Mismatches = 65 Conservative Sub.s = 35 

144 174 204 234 264 294 324 354 

W*VFLS*LRYTTYILiCTFLFIKPPKYSSR*VLFLIF*FKFSNKLIASV*ALHYINSIVIRFFLNKWLVKASSLVVLGGMV 

20 MKRIFFAFLSLCLF 

10 

384 414 444 474 504 534 564 594 

25 LSAGSRVLADTYVRPIDNGRIT ^ M 

IFGTQTVSAATYTRPLDTGNITTGFNGYPGHVGVDYATO^^ 

30 40 50 60 70 80 90 

624 654 684 714 744 774 804 834 

30 HSGYAHMSRWARTGEKVKQGDIlGYVGATGMATGPHLHFEFLPANPNFQNGFHGRINPTSLIflNVATFSGKTQASAPSI 

|:iiii"i» i 1 1 1 1 1 1 1 1 mi mini iiiiibiiii 111 = 11 in hi i 

HTGYAHLSKISYSTDSTVKQGQIIGYTGATGO\TGPHLHFEML?ANPNV/QNGFSGRIDPTGYIANAPVFNGTT 

110 120 130 140 150 160 

35 864 894 924 954 984 1014 1044 1074 

KPLQSAPVQNQSSKLKVYRvDELQKVNGvra,VKNOT^ 

I : I ' =: || = | = ||:|1|:||:| Ml III INI | =.= || >|| I : I I I I | I I I I I = II 
-P--TEP-TTPTTMjKIYKTODLQKINGIWQVFJSINILTO 

180 190 200 210 220 230 240 

40 

1104 1134 1164 1194 1224 1254 1284 1314 

TLKTVEKPIQGTAGLTWAKTRFANGSSVWLRVDNSQELLYK*FSVLIHCFK*QLCY*LSTISIMILKIIL*SSKV'*YYSL 

= 1 = 1 |::|: 11=11= I I =111 = III 
NVKSVGTPMKGSGGLSWAQVNFTTGGNVWLNTTSKDNLLYGK 
45 260 270 280 

SEQ ID 8542 (GBS36) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 11 (lane 4; MW 34.1kDa). 

GBS36-His was purified as shown in Figure 192, lane 7. 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 297 

A DNA sequence (GBSx0325) was identified in S.agalactiae <SEQ ID 957> which encodes the amino acid 
sequence <SEQ ID 958>. This protein is predicted to be phosphoribosylaminoimidazolecarboxamide 
55 formyltransferase/IMP cyclohyd. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm --- Certainty=0. 2815 (Affirmative) < succ: 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < succ? 
bacterial outside --- Certainty=0 .0000 (Not Clear) < su.cc> 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04352 GB:AP001509 phosphoxibosylai 



formyltransferase/IMP cyclohydrola.se [Bacillus halodurans] 
Identities = 310/515 (60%) , Positives = 390/515 (75%) , Gaps = 4/515 (0%) 

Query: 1 MTKI^ISVSDKSGIIBFAKELKNIaGWDIISTGGTKVALDDAGVETIAIDDVTGFPEMMD 60 

M +RRL+SVS+K GI+ FAK L +I+STGGTK AL +AG+ I DVTGFPE++D 

Sbjct: 1 MKSmLVSVSNKEGIVPPAKALVEHEVEIVSTGGTKRALQEAGIPVTGISDVTGFPEILD 60 

Query: 61 GRVKTLHPNIHGGLLARRDADSHLQAARDNNIEM 120 

GRVKTLHPNIHGGLLA R+ D HL ID VWNLYPF++TI +P+ T+ A+ 

Sbjct: 61 GRWTLHPNIHGGLIJ^ERDEHIAQI^HHIRPIDPVWKLyPFQQTIAKSEATFADAr 120 

Query: 121 ENIDIGGPSMLRSAAKimSVTWVDSADVATVMEr^ASQTTFKTRQRIAAKAFRHTA 180 

ENIDIGGPSMLR+AAKNH VTVWD DY TVL ELAD +T++RLAAK FRHTA 

Sbjct: 121 ENIDIGGPSMLR&AW^QH\mroVD!™ 180 

Query: 181 AmALIAEYFTAQVGEAKPEKLTITYDLKQAMRYGENPQQDADFYQKALPTDYSIASAKQ 240 

AYDA+IAEY T VGE PE LT+T++ KQ +RYGENP Q A FYQK L SIA AKQ 
Sbjct: 1S1 AYDAMIAEYLTDAVGEESPESLTWTFEKKQDLRYGENPHQKATFYQKPLGAKASIAHAKQ 240 

Query: 241 I^GKELSE'NNIRDADAAlRIIPJDFKDSPTWALKHIffilPCGIGQADDIETAWDYAYEADPV 300 

L+GKELS+NNI DADAA+ I+++FK+ P VA+KHMNPCG+G + 1+ A+D AYEADPV 
Sbjct: 241 LHGKELSYNIIINDADAALSIVKEFKE-PAAVAVK^ 239 

Query. 301 S I FGGI WLNREVDAATAEKMHPI FLE 1 1 IAPSYSEEALAILTNKKKNLRILELPFDAQA 360 

SIFGGI+ LNREVD TA+ + IFLEIIIAPS+SEEAL +LT+ KKNLR+L LP + + 
Sbjct: 300 SIFGSIIALNREVDVETAKTLKElFLEIlIAPSFSEES^DVLTS-KKMLRLLTLPIiNEE- 357 

Query: 361 ASEVEAEYTGWGGLLVQNQDWAENPSDWQVVTDRQPTEQFATAtEFAWKAIKWKSWG 420 

++ E T + GG LVQ +D ++ ++ T R+PTE E AL+ AW+ +K+VKSN 

Sbjct: 358 -NQAEKRITS IHGGAIiVQEEDTYGFEEAEI KI PTKREPTEAEWEALKLATOWKHVKSNA 416 

Query: 421 IIITNDHMTLGLGAGQTNRVGSVKIAIEQAKEHLEGAVLASDAFFPFADNIEEIAAAGIK 480 

1++ 4 MT+G+GAGQ NRVG+ KIAIEQA + G+V+ SDAFFP D +E A AGI 
Sbjct: 417 IVLADGQMTVGVGaGQIWRVGAAKIAIEQAGEKAAGSVMGSDAFFPMGDTVELAAKAGIT 476 

Query: 481 AIIQPGGSVRDQESIDAANKHGLTMIFTGVRHFRH 515 

AIIQPGGS+KD+ESI+ A+KHG+ M+FTGVRHF+H 
Sbjct: 477 AIIQPGGSIRDEESIENADKHGIAMVFTGVRHFKH 511 

A related DNA sequence was identified in S.pyogenes <SEQ ID 959> which encodes the amino acid 
sequence <SEQ ID 960>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2932 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 500/515 (97%) , Positives = 507/515 (98%) 

Query: 1 MTKRALISVSDKSGIIDFAKELKI^VTOIISTGGT 60 

MTKRALI SVSDKSGI +DFAKEI1KNLGWDI 1STGGTKV LDDAGVETIAIDDVT FPEMMD 
Sbjct: 1 MTKIlALISVSDKBGIVDFAKELKlILGfJDIISTGGTKVTLDDAGVETIAIDDVTRFPEMMD 60 

Query: 61 GRvXTIfiPNIHGGLLARPJDADSHLQAATONN^ 120 
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Query: 121 ENIDIGGPSMIJ^SAAKNHASVWWDSADYATVLGELADASQTTFKTRQRLAAKRFRHTA 180 

ENIDIGGPSMIjRSAAKNHASVTWVD ADYATVLGELADA QTTF+TRQRLAAK FRHTA 
Sbjct: 121 ENIDIGGPSMLRSAAKNHASVTVVVDPADYATVLGEIADAGQTTFETRQR]1AAKVFRHTA 180 

Query: 181 AYDALIAEYFTAQVGEAKPEKLTITYDLEQAMRYGENPQQDADFYQKALPTDYSIASAKQ 240 

AYDALIAEYFT QVGEAKPEKLTITYDLKQAMRYGENPQQDADFYQKALPTDYS IASAKQ 
Sbjct: 181 AYDALIAEYFTTQVGEAKPEKLTITYDLKQAMRYGENPQQDADFYQKALPTDYSIASAKQ 240 

Query: 241 LNGKELSF1OTIRDADAAIRIIRDFKDSPTWALKHMNPCGIGQADDIETAWDYAYEADPV 300 

LNGKELS FNNIRDADAAIRI IRDFKD PTWALKHMNPCGIGQADDIETAWDY Y+ADPV 
Sbjct: 241 LNGKELSFmiRDADAAIRIIRDFKDRPTWALKHMNPCGIGQADDIETAWDYTYKADPV 300 

Query: 301 SIFGGIWLNREVDAATAEKMHPIFLEI I IAPSYSEEALMLTNKKKHLRILELPFDAQA 360 

SIFGGI+VLMREVDAATA+KMHPIFLEI I IAPSYSEEAIAILTNKKKNLRILELPFDAQA 
Sbjct: 301 SIFGGIIVLNREVDAATAKKMHPIFLEIIIAPSYSEEALAILTNKKKNLRILELPFDAQA 360 

Query: 361 ASEVEAEYTGWGGLLVQNQDWAENPSDWQWTDRQPTEQEATALEFAWKAIKYVKSNG 420 

ASEVEAEYTGWGGLLVQNQDWAENPSDWQ\T7TDRQPTEQEATALEFAWKAIKYVKSNG 
Sbjct: 361 ASEVEAEYTGWGGLLVQNQDVVAENPSDWQVVTDRQPTEQEATALEFAWKAIKYVKSNG 420 

Query: 421 IIITNDHMTLGLGAGQTNRVGSVKIAIEQAKDHLDGAVLASDAFFPFADNIEEIAAAGIK 480 

IIITNDHMTLGLGAGQTNRVGSVKIAIEQAKDHLDGAVIASDAFFPFADNIEEIAAAGIK 
Sbjct: 421 IIITNDHMTLGLGAGQTNRVGSVKIAIEQAKDHLDGAYLASDAFFPFADNIEEIAAAG1K 480 

Query: 481 AIIQPGGSVRDQESIDAANKHGLTMIFTGVRHFRH 515 

AI IQPGGSVRDQ+SIDAANKHGLTMI FTGVRHFRH 
Sbjct: 481 AIIQPGGSVRDQDSIDAANKHGLTMI FTGVRHFRH 515 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 298 

A DNA sequence (GBSx0326) was identified in S.agalactiae <SEQ ID 961 > which encodes the amino acid 
sequence <SEQ ID 962>. This protein is predicted to be similar to antibiotic resistance protein. Analysis of 
this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1842 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12342 GB:Z99106 similar to antibiotic resistance protein 
[Bacillus subtilis] 
Identities = 65/263 (24%), Positives = 117/263 (43%), Gaps = 34/263 (12%) 

5 KNLEIVESIFGD-WDETIIWSCV-C^IMGEVFVDSLDQPKSSLAKLGRKSSFGFLAGQPT 62 

K ++++F D + T ++S + Q I G V+ D PKS +G +S F+AG 
10 KKYSSLKTMFDDKYCPTFVYSILDQTIPGAVYADDQTFPKSFF- -IGTESGIYFIAGDQG 67 

63 LFLLEVCSGEDIILVPQHKGWSDLIESTYGQNAHSFKRYATKKDTLFERS 112 

+ +V S + L W +++ + + +R A + 
68 NRDFHDFIAGYYEEQVKSSKRFTLFSSSDTKDSVLKPILKDDLNQMRRAAFSY QP 122 

Query: 113 RLEKFVTQLPNGFELRAIDEKV YNS CLE KEWSQDLVANYATYQYYKKQGIGYW 165 

+ K QLP G L+ IDE + +NS +E+ + + + +G G+ V 

Sbjct: 123 KSFKKTLQLPKGLVLKRIDEDI I SHSTAFNSAYYEEY WNSVSQFASKGFGFAV 175 
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Query: 167 YYQGNIIAGASSYSTYKNGIEIEVDTHPDFRRRGBATIVAAQLILTCLDKGIYPSWDAH- 225 

+ ++++ +S N E+++ T ++R GLA VA + I C++ GI PSWD 

Sbjct: 176 mGNHWSECTSIFLGHNRAEMD-YTLEEYRGLGLAYCVANEFIAFCMENGIVPSWDCDI 235 

5 

Query: 226 -TRTSLNLSEKLGYEFSHEYIAY 247 

+S+ L+ KLG++ EY Y 
Sbjct: 236 CNWSSIALAAKLGFKTVTEYTIY 258 

10 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 299 

A DNA sequence (GBSx0328) was identified in S.agalactiae <SEQ ID 963> which encodes the amino acid 
15 sequence <SEQ ID 964>. This protein is predicted to be phosphoribosylglycinamide formyltransferase 
homolog (purN). Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terrainal signal sequence 

20 Final Results 

bacterial cytoplasm --- Certainty=0 . 0736 (Affirmative) < suco 

bacterial membrane — - Certainty=0 . 00 00 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 965> which encodes the amino acid 
sequence <SEQ ID 966>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 75 - 91 ( 75 - 91) 

30 

Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) c suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

35 

The protein has homology with the following sequences in the databases: 

>GP:CAA04374 GB :AJ000883 purD [Lactococcus lactis] 
Identities = 236/419 (56%) , Positives = 301/419 (71%) , Gaps = 7/419 (1%) 

40 Query: 50 LKLLWGSGGREHAIAKKLLASKGVDQVFVAPGNDGMTLDGLDLVNIWSEHSRLIAFAK 109 

+K+LV+GSGGREHA+AKK + S V++VFVAPGN GM DG+ +V+I + +L+ FA+ 
Sbjct: 1 MKILVIGSGGREHALAKKFMESPQVEEVFVAPGNSGMEKDGIQIVHISELSNDKLVKFAQ 60 

Query: 110 ENEISWAFIGPDDAIAAGIVDDFNSAGLRAFGPTKAAAELEWSICDFAKEIMVKYNVPTAR 169 
45 I F+GP+ AL G+VD F A L FGP K AAELE SKDFAK IM KY VPTA 

Sbjct: 61 NQNIGLTFVGPETALMNGVVDAFIKAELPIFGPHKMAAELEGSKDFAKSIMKKYGVPTAD 120 

Query: 170 YGTFSDFEKAKAYIEEQGAPIVVKADGLALGKGVWAETVEQAVEAAQEMLLDNKFGDSG 229 
YTF E A AY++E+G P+V+KADGLA GKGV VA +E A A ++ F S 

50 Sbjct: 121 YATFDSLEPALAYLDEKGVPLVIKADGlftAGKGTOAFDIETAICSALADI FSGSQ 175 

Query: 230 ARWIEEFLDGEEFSLFAFANGDKFYIMPTAQDHKRAFDGDKGPNTGGMGAYAPVPHLPQ 289 

+WIEEFLDGEEFSLF+F + K Y MP AQDHKRAFD DKGPNTGGMGAY+PV H+ + 
Sbjct- 176 GKWIEEFLDGEEFSLFSFIHDGKIYPMPIAQDHKRAFDEDKGPNTGGMGAYSPVLHISK 235 

55 

Query: 290 SVVDTAVEMIVRPVLEGMVAEGRPYLGVLT/GLILTADGPKVIEFNSRFGDPETQIILPR 349 

W+ A+E +V+P + GM+ EG+ + GVLY GLILT DG K IEFN+RFGDPETQ++LPR 
Sbjct: 236 EVVNEALEKWKPTVAGMIEEGKSFTGVLYAGLILTEDGVKTIEFNARFGDPETQVVLPR 295 
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Query: 350 LTSDFAQNIDDIMMGIEPYITWQKDGVTLGWA3EGYPPDYEKGVPLPEKTDGDIITYY 409 

L SD AQ I DI+ G EP + W + GVTLGWVA+EGYP + G+ LPE +G + YY 
Sbjct: 296 LKSDIAQAIIDIIAGNEPTIiEWLESGVTI^VWAREGYPSQAKLGLILPEIPEG-IiNVYY 354 

Query: 410 AGVKFSENSELLLSNGGRVYMLVTTEDSVKAGQDKIYTQLAQQDTTGLFYRNDIGSKRI 468 

AGV +EN++ L+S4GGRVY++ T + VK+ Q +Y +L + + G FYR+DIGS+AI 
Sbjct: 355 AGVSKNENHQ-LISSGGRVYLVSETGEDVKSTQKLLY3KLDKLENDGFFYRHDIGSRAI 412 

1 0 An aligmnent of the GAS and GBS proteins is shown below: 

Identities = 172/182 (94%) , Positives = 176/182 (96%) 



Query: 1 MKIAVFASGNGSNFQVIAEQFQVSFVTSDHRDAYVLERAQNLAIPSFAFELKEFENKAAY 60 
MKIAVFASGNGSNFQVTAEQF VSFVFSDHRDAYVLERAQNLAIPSFAFELKEFENK AY 
15 Sbjct: 1 MKIAVFASGNGSNFQVIAEQFPVSFVFSDHRDAYVLERAQNLAIPSFAFELKEFENKVAY 60 



Query: 61 EQAWDLLDKHEIDLVCLAGYMKIVGETLLSAYEGRIINIHPTYLPEFPGAHGIKDAWEA 120 

EQA+VDLLDKHEIDLVCLAGYMKIVGETLL AYE RIINIHP YLPEFPGAHGI+DAWEA 
Sbjct: 61 EQAIVDLLDKHEIDLVCIAGYMKIVGETLLLAYERRI INIHPAYLPEFPGAHGIEDAWEA 120 

Query: 121 GVDQSGVTIHWVDSGVDTGQVIQQVHVPRLADDSLESFETRIHETEYQLYPAVLDSLGIK 180 

GVDQSGVTIHWVDSGVDTGQVIQQV VPRLADDSLESFETRIHETEYQLYPAVLDSLG++ 
Sbjct: 121 GVDQSGVT I HWVDSGVDTGQVI QQVRVPRLADDSLES FETRIHETEYQLYPAVLDSLGVE 180 

Query: 181 RK 182 

Sbjct: 181 RK 182 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 300 

A DNA sequence (GBSx0329) was identified in S.agalactiae <SEQ ID 967> which encodes the amino acid 

sequence <SEQ ID 968>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
35 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 121 - 137 ( 121 - 137) 



Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

40 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC1S901 GB:AF016634 phosphor ibosyl formylglycinamide 
45 cyclo-ligase [Lactococcus lactis subsp. cremoris] 

Identities = 253/338 (74%), Positives = 288/338 (84%), Gaps = 4/338 (1%) 



Query: 4 KNAYAQSGVDVEAGYEVVERIKKHVARTERAGVMGAIGGFGGMFDLSQTGVKEPVLISGT 63 
+NAYA+SGVDVEAGYEW RI KKHVA+TER GV+GALGGFGG FDLS VKEPVLISGT 
50 Sbjct: 5 ENAYAKSGVDVEAGYEWSRIKKHVAKTERLGVLGALGGFGGSFDLSVLDVKEPVLISGT 64 

Query: 64 DGVGTKLMIAIKYDKHDTIGQDCTOMCVNDIIAAGAEPLYFLDYVATGKNEPAKLEQWA 123 

DGVGTKLMLAI+ DKHDTIG DCVAMCVNDI IAAGAEPLYFLDY+ATGKN P KLEQWA 
Sbjct: 65 DGVGTKLMIAIRADKHDTIGIDCTAt^CVNDIIAAGAEPLYFLDYIATGKNIPEKLEQWA 124 

55 

Query: 124 GVAEGCVQASAALIGGETAEMPGMYGEDDYDLAGFAVGVAEKSQIIDGSK-VKEGDILLG 182 

GVAEGC+QA AALIGGETAEMPGMY EDDYDLAGFAVGVAEKSQ+IDG K V+ GD+LLG 
Sbjct: 125 GVAEGCLQAGAALIGGETAEMPGMYDEDDYTJLAGFAVGVAEKSQLIDGEKDVEAGDVLLG 184 



60 Query: 183 IASSGIHSNGYSLVRRVFADYTGDE^/LPELEGKQLKDVLLEPTRIYVKAALPLIKEELvN 242 

LASSGIHSNGYSLVR+VFAD+ +E LPEL+ + L D LL PT+IYVK LPLIK+ + 
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Sbjct: IBS LASSGIHSNGYSLWKVFADFDLNESLPELD-QSLIDTLLTPTKIYVTCELLPLIKQNKIK 243 

Query: 243 GIAHITGGGFIEOTPRMFMDLAREIDEDKVPVLPIFKALEKYGDIKHEEMFEIFNMGVG 302 

GIAHITGGGF EN+PRMF + L+AEI E VLPIFKALEKYG IKHEEM+EIFNMG+G 
Sbjct: 244 GIAHITGGGFHENLPRMFGNSLSAEI\'EC-SIVD\^FIFI<ALEICY'GSIKHEEMYEIFNMGIG 303 



Query: 303 LMLDVNPENVDRVKELLDEPWEIGRI IKKADDSWIK 340 

+++ V PEN +K+ L+ +EIG+++ + + WIK 
Sbjct: 304 MVIAVAPENAAALKKELN--AFEIGCMVNRQEAPWIK 339 

A related DNA sequence was identified in S.pyogenes <SEQ ID 969> which encodes the amino acid 
sequence <SEQ ID 970>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



- Certainty=0. 3236 (Affirmative) . 

- Certainty=0. 0000 (Wot Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 321/340 (94%) , Positives = 332/340 (97%) 



Query: 61 SGTDGVGTKLMLA1KYDKHDTIGQDCVAMCVKDIIAAGAEPLYFLDYVATGKNEPAKLEQ 120 

SGTDGVGTKLMLAI KYDKHDTIGQDCVAMCVNDI IAAGAEPIiYFLDY+ATGKN P KLE+ 
Sbjct: 61 SGTDGVGTKLMLAI KXDKHDTIGQDCTAMCTNDIIAAGAEPLYFLDYIATGKMNPVKLEE 120 

Query: 121 WAGVAEGCVQASAALIGGETAEMPGMYGEDDYDLAGFAVGVAEKSQIIDGSKVKEGDIL 180 

W+GVAEGCVQA AALIGGETAEMPGMYG+DDYDIAGFAVGVAEKSQI IDGSKVKEGDIL 
Sbjct: 121 WSGVAEGCVQAGAALIGGETAEMPGMYGQDDYDLAGFAVGVAEKSQIIDGSJCVPCEGDIL 1B0 

Query: 181 LGLASSGIHSNGYSLVRRVFADYTGDEVLPELEGKQLKDVLLEPTRIYVKAALPLIKEEL 240 

LGLASSGIHSNGYSLVRRVFADYTG E+LPELEGKQLKDVLLEPTRIYVKAALPLIKEEL 
Sbjct: 181 LGLASSGIHSKtGYSLVRRVFADYTGKELLPELEGKQLKDVLLEPTRIYVKA&LPLIKEEL 240 

Query: 241 VNGIAHITGGGFIENVPRMFADDLAAEIDEDKVPVLPIFKALEKYGDIKHEEMFEIFNMG 300 

V GI HITGGGFIEN+PFJ«FADDLAAEIDEDKVPVLPIFKALEKYGDIKHEEMFEIFNMG 
Sbjct: 241 VKGIGHITGGGFIENIPRMFADDLAAEIDEDKVPVLPIFJCALEKYGDIKHEEMFEIFNMG 300 

Query: 301 VGLMLDVWPENVDRVE^ELLDEPVYEIGRIIKKADDSWIK 340 

VGLML V+PENV+RVKELLDEPVYEIGRI IKKAD SWIK 
Sbjct: 301 VGLMLAVSPENVNRVKELLDEPVYEIGRI IKKADASWIK 340 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 301 

A DNA sequence (GBSx0330) was identified in S.agalactiae <SEQ ID 971> which encodes the amino acid 
sequence <SEQ ID 972>. This protein is predicted to be phosphoribosylpyrophosphate amidotransferase 
(purF). Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 1112 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside -— Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD12627 GB:U64311 phosphoribosylpyrophosphate amidotransf erase 
[Lactococcus lactis] 
Identities = 340/470 (72%), Positives = 404/470 (85%), Gaps = 5/470 (1%) 

Query: 3 YEVKSLM3ECGVFGIWGYPQAAQVTYFGLHSLQHRGQSGAGI I SNDNGKLYGYRNVGLLS 52 

+E K+LNEECG+FG+WG+P AA++TYFGLH+LQHRGQEGAGI+ N+NGKL +R +GL++ 
Sbjct: 37 FEAKTLNEECGLFGWGHPDAARLTYFGLHALQHRGQEGAGILVNHNGKLNRHRGLGLVT 96 

Query: 63 EVFKNQSELDNLTGNAAIGHVRYATAGSADrRNIQPFLYKFHDGQFALCHNGNLTNAISS 122 

EVF+++ +L+ LTG++AIGHVRYATAGSA+I HIQPF ++FHDG L HNGNLTNA S 
Sbjct: 97 EVFRHEKDLEELTGSSAIGHVRYATAGSANINNIQPFQFEFHDGSLGLAHNGNLTNAQSL 156 

Query: 123 RKELEKO^^IFNASSDTEILMHLIRRSHNPSFMGKVKEALSTVKGGFAYLLMTEDKLIAA 182 

R ELEK GAIF+++SDTEILMHLIRRSH+P FMG+VKEAL+TVKGGFAYL+MTE+ ++AA 
Sbjct: 157 RCELEKSGAIFSSNSDTEILMHLIRRSHHPEFMGRVKFAloNTVKGGFAYLIMTENSIVAA 216 

Query: 183 LDPNAFRPLSIGQMQNGAWVISSETCAFEWGAKWVRDVEPGEVILIDDSGIQCDRYTDE 242 

LDPN FRPLSIG+M NGA V++SETCAF+WGA W++DV+PGE+I I+D GI D++TD 
Sbjct: 217 LDPNGFRPLSIGKMSNGALWASETCAFDWGATWIQDVQPGEIIEINDDGIHVDQFTDS 276 

Query: 243 TQLAICSMEYVYFARPDSTIHGVNVHTARKNMGKRLAQEFKQDADIVIGVPNSSLSAANG 302 

T + I CSMEY+ YFARPDS I GVNVHTARK GK LAQE K DADIVIGVPNSSLSAA G 
Sbjct: 277 TNMTICSMEYIYFARPDSNIAGVNl'HTARKRSGKILAQEAKIDADIVIGVPNSSLSAASG 336 

Query: 303 FAEESGLPNEMGLVKNQYTQRTFIQPTQELREQSVRMKLSAVSGVVKGKRVUMIDDSIVR 362 

+AEESGLP EMGL+KNQY RTFIQPTQELREQGVRMKLSAV GW+GKRV+M+DDSIVR 
Sbjct: 337 YAEESGLPYEMGLIKNQYVARTFIQPTQELREQGVRMKLSAVRGWEGKRVIMVDDSIVR 396 

Query: 363 GTTSRRIVGLLREAGATEVHVAIASPELKYPCFYGIDIQTRRELISANHAVDEVCDIIGA 422 

GTTSRRIV LL++AGA EVHVAIASP IiKYPCFYGIDIQ R EIiI+A H DE+ + IGA 
Sbjct: 397 GTTSRRIVKLLKDAGAAEVHVAIASPALKYPCFYGIDIQDRDELIAATHTTDEIREAIGA 456 

Query: 423 DSLTYLSIDGL1KSIGLETKAPNGGLCVAYFDGHYPTPLYDYEEEYLRSL 472 

DSLTYLS GL+++IG +■ LC++YFDG YPTPLYDYE +YL SL 

Sbjct: 457 DSLTYLSQSGLVEAIG HDKLCLSYFDGEYPTPLYDYEADYLESL 500 

A related DNA sequence was identified in S.pyogenes <SEQ ID 973> which encodes the amino acid 
sequence <SEQ ID 974>. Analysis of this protein sequence reveals the following: 



Possible site: 21 
>» Seems to have r 



Final Results 

bacterial cytoplasm --- Certainty=0. 0610 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 473/484 (97%) , Positives = 481/484 (93%) 

Query: 1 MTYEVKSIiNEECGVFGIWGYPQAAQWYFGLHSLQHRGQEGAGIISNDNGKLYGYRNVGL 60 

MTYEVKSLNEECGVFGIWG+PQAAQOTYFGLHSLQHRGQEGAGI+SNDNGKLYGYRNVGL 
Sbjct: 20 MTYEVKSLNEECGVFGIWGHPQAAQ\TYFGLHSLQHRGQEGAGIVSNDNGKLYGYRNVGL 79 

Query: 61 LSEVFKNQSELDNLTGNAAIGHVRYATAGSADIRNIQPFLYKFHDGQFALCHNGNLTNAI 120 

LSEVFKNQSELDNLTGNAAIGHVRYATAGSADIRNIQPFLYKFHDGQFALCHNGNLTNAI 
Sbjct: 80 LSEVFKNQSELDNLTGNAAIGHVRYATAGSADIRNIQPFLYKFHDGQFALCHNGNLTNAI 139 

Query: 121 SSRKELEKCjGAIFNASSDTEILIfflLIRRSHNPSFMGKVKFJVLSTVKGGFAYLLMTEDKLI 180 

S RKELEKQGAIFNASSDTEILMHLIRRSHN SFMGKVKEAL+TVKGGFAYLLMTE+KLI 
Sbjct: 140 SLRKELEKC/^IFNASSDTEILMHLIRRSHNSSFMGKVKEALNW 199 
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Query: 181 AALDPNAFRPLSIGQMQNGAWVISSETCAFEWGAKWVRDVEPGEVILIDDSGIQCDRYT 240 

AALDPNAFRPLSIGQMQNGAWISSETCAFEWGAKHVRDVEPGEVILIDD giqcdryt 
Sbjct: 200 AALDPNAFRPLSIGQMQNGAWVISSETCAFEWGAKWVRDVEPGEVILIDDRGIQCDRYT 259 

5 Query: 241 DETQLAICSMEYVYFARPDSTIHGVNVHTARKNMGKRIAQEFKQDADIVIGVPNSSLSAA 300 

DETQLAICSMEYVYFARPDSTIHGVNVHTARKNMGKRLAQEFKQDADIVIGVPNSSLSAA 
Sbjct: 260 DETQLAICSMEYVYFARPDSTIHGVNVHTARKMMGKRLAQEFKQDADIVIGVPNSSLSAA 319 

Query: 301 MGFAEESGLPNEMGLVKNQYTQRTFIQPTQELRECG^/RMKLEAVSGWKGKRWMIDDSI 360 
1 0 MGFAEESGLPNEMGLVKNQYTQRTFIQPTQELRECGVRMKLSAVSGWKGKRWMIDDSI 

Sbjct: 320 MGFAEESGLPNEMGLVKNQYTQRTFIQPTQELRECGVRMKLSAVSGWKGKRWMIDDSI 379 

Query: 361 VRGTTSRRIVGLLREAGATEraVAIASPELKYPCFYGIDIQTRRELISANHAVDEVCDII 420 
VRGTTSRRIVGLLREAGA+EVHVAIASPELKYPCFYGIDIQTRRELISANH+VDEVCDII 
15 Sbjct: 380 VRGTTSRRIVGLLREAGASEVHVAIASPELKYPCFYGIDIQTRRELISANHSVDEVCDII 439 

Query: 421 GADSLTYLSIDGLIKSIGLETKAPWGGLCVAYFDGHYPTPLYDYEEEYLRSLEEKTSFYI 480 

GADSLTYLS+DGLI+SIGLETKAPNGGLCVAYFDSHYPTPLYDYEEEYLRSLEEKTSFYI 
Sbjct: 440 GADSLTYLSLDGLIESIGLETKAPNGGLCVAYFDGHYPTPLYDYEEEYLRSLEEKTSFY1 499 

20 

Query: 481 QKVK 484 
QKVK 

Sbjct: 500 QKVK 503 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 302 

A DNA sequence (GBSx0331) was identified in S.agalactiae <SEQ ID 975> which encodes the amino acid 

sequence <SEQ ID 976>. Analysis of this protein sequence reveals the following: 

30 Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4797 (Affirmative) < suco 

35 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) . < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 303 

A DNA sequence (GBSx0332) was identified in S.agalactiae <SEQ ID 977> which encodes the amino acid 
sequence <SEQ ID 978>. Analysis of this protein sequence reveals the following: 

45 Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3489 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 304 

A DNA sequence (GBSx0333) was identified in S.agalactiae <SEQ ID 979> which encodes the amino acid 
5 sequence <SEQ ID 980>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 1690 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAC12194 GB:AL445066 phosphoribosylformylglycinamidine synthase 

related protein [Thermoplasma acidophilum] 
Identities = 199/746 (26%), Positives = 329/746 (43%), Gaps = 103/746 (13%) 

Query: 202 ADD--FAAYKAEQG1AMEVDDLLFIQDYFKSIGRVPTETELKVLDTYWSDHCRHTTFETE 259 
20 ADD A GLA+ +D++ ++ YF+ +GR P + E+ + WS+HC + + + 

Sbjct: 11 ADDARLKAISKRLGLALSLDEMKAVRSYFERLGRDPIDAEIHAVAQSWSEHCSYKSSKYY 70 

Query: 260 LKNIDFSASKFQKQLQATYDKYIAMRDELGRSEKPQTLMDMATIFGRYERANGRLDDMEV 319 
LK K+ L+ Y +AM D+ G 
25 Sbjct: 71 LK-- KYLGSLKTDYT- ILAMEDDAG 92 



Query: 320 SDEINACSVEIEVDVDGVKEPWLLMFKNETHNHPTEIEPFGGA^TCIGGAIRDPLSGRSY 379 

VD DG + + K E+HNHP+ +EP+GGAAT IGG +RD L + 
Sbjct: 93 -VVDFrX3---EYAYVIiKLffiSHNHPSAvEPYGGAATGIGGIVRDVLCMGAQ 138 

Query: 380 VYQAMRISGAGDITTPIAETRAGKLPQQVISKTAAHGYSSYGNQIGLATTYVREYFHPGF 439 

+ GD+++ E G L + I G YGN+IG+ YF + 

Sbjct: 139 PVALIDSLFLGDVSSDRYE GLLSPRYIFGGWGGIRDYGNRIGIPNVAGSLYFDICLY 195 

Query: 440 VAKFJ*IELGAWGAAPKENVTOEKP-EaGDVvVIjLGGKTGRDGVGGATGSSKVQTVESVET 498 

+ + VG ++ +VR K + GDV+VL+GGKTGRDG+ G +S + ++ 

Sbjct: 196 NSNPLVNAGCVG1VRRDRIVRSKSYKPGDVLVLMGGKTGRDGIHGVNFASTTLG-IWTKS 254 

Query: 499 AGAEVQKGNAIEERKIQRLFRDGNVTRLIKKSNDFGAGGVCVAIGELAD GLEIDLD 554 

+ +Q GN I E+ + + + N L1+ D G GG+ A E+ G EI LD 

Sbjct: 255 SRLAIQLGNPIVEQPMIKAVLE^AGLIRAMKDLGGGGLSSAATEMVYAGGFGAEITLD 314 

Query: 555 KVPLKYCK3LNGTEIAISESQERMSVWGPSDVDAFIAACTKENIDAVVVATVTEKPNLVM 614 

+ LK ++G EI ISESQERM + P DV+ K N+D V+ VT + + 

Sbjct: 315 DIKLKESNMSGWEIWISESQERMLMECYPEDTOKIRQIAEKHNLDFSVIGQVTADRRIRV 374 

Query: 615 TWNGETI VDLERCFLDTNGV-RVVVDAK\^KDLTVPFARTTSAETLEADMLKVLSDLNH 573 

+ I+D++ FLD 4- V + K V+K +TVP+ E L + + ++ LN 

Sbjct: 375 YYKKRKI IDMDIEFLDDSPVYQRPYRI KEVEKSVTVPQ EPEDLNSFVRDFMARLNT 430 

Query: 674 ASQKGLQTIFDSSVGRSTVNHPIGGR-YQITPTESSVQKLPVQYGVTTTASVMAQGYNPY 732 

++ + +D +V ST+ P GR + T +++V K P++ + V+ G P 

Sbjct: 431 CARFNVWQYDHTTOGSTIVTPFVGRPNKETHADATVIK-PLENSM--RGLVLTSGSRPN 487 

Query: 733 IAEWSPYHGAAYAVIEATARLVATGADWSRARFSYQEYFERMDKQAERFGQPVSALLGSI 792 

+ PY G + EA +++TG R ++ E GQ V ++ 

Sbjct: 488 MVSVDPYAGTLLTLftEAYKNILSTG GRPHSWDALNFGNPEREEIMGQFVESVRAIG 544 



Query: 793 FAQIQFGLPSIGGKBSMSGTFEELTV?PTI J VAFG\'TTADS-RKVLSPEFKAAGENIY 848 

60 + + GLP + G S + + + PT V D R+ + K +G IY 

Sbjct: 545 DFCRKMGLPWAGWSFYNEYRKTDIMPTPTIMMVGLIDDVRRSRTTYMKGSGNAIYLIG 604 
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Query: 849 
Sbjct: S05 



Query: 891 VLESLALMTFGNRIGASVEIAELDSS 916 

+ +L+ M+FG+ IG V+I+ + ++ 
Sbjct: 659 LFAALSEMSFGSGIGFHVDISNVSAA 684 



A related DNA sequence was identified in S. pyogenes <SEQ ID 98 1> which encodes the amino acid 
sequence <SEQ ID 982>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1415 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 1219/1256 (97%) , Positives = 1226/1256 (97%) 



Query: 


11 


Sbjct: 


2 


Query: 


71 


Sbjct: 


62 


Query: 


131 


Sbj ct : 


122 




191 


Sbjct: 


182 




251 


Sbjct: 


242 




311 


Sbjct: 


302 




371 


Sbjct: 


362 


Query: 


431 


Sbjct: 


422 


Query: 


491 


Sbjct: 


482 




551 


Sbjct: 


542 




611 


Sbjct: 


602 



I NLDFFETYQADDFA YKAEQGLAMEVDDLLFIQ+YFKSIG VPTETELKVLDTYWSDH 



CRHTTFETELKNIDFSASKFQKQLQ TYDKYIAMRDELGRSEKPQTLMDMATI FGRYERA 



RDPLSGRSYVYQAMRISGAGDITTPIAETRAGKLPQQVISKTAAHGYSSYGNQIGLATTY 



VREYFHPGFVAKRMELGAWGAAPKENWREKPEAGDW+LLGGKTGRDGVGGATGSSKV 



QTVESVETAGAEVQKGNAIEERKIQRLFRDGNVTRLIKKSNDFGAGGVCVAIGELADGLE 



IDLDKVPLKYQGLNGTE IA1 SESQERMS VW P+DVDAFIAACNKENIDAWVATVTEKP 



NLVMTWNGE IVDLER FIIDTNGWVVVrjAKVVDKDLTVPEMTTSAETLEAD 1 
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Query: 671 LNHASQKGLQTIFDSSVGRSTVNHPIGGRYQITETESSVQKLPVQYGVTTTASVMAQGYN 730 

IJ^SQKGLQTIFDSSVGRSTVNHPIGGRYQITPTESSVQKLPVQ+GVTTTASVMAQGYN 
Sbjct: 662 LNHASQKGLQTIFDSSVGRSTVNHPIGGRYQITPTESSVQKLPVQHGVTTTASVMAQGYN 721 

Query: 731 PYIAEWSPYHGAaYAVIEATARLVaTGflDWSRARFSYQEYFERMDKQAERFGQPVSALLG 790 

PYIAEWSPYHGAAYAVIFATARLmTGADWSRARFSYQEYFERMDKQAERFGQPVSALLG 
Sbjct: 722 PYIAEWSPYHGAAYAVIEATARLVATGADWSRARFSYQEYFERMDKQAERFGQPVSALLG 781 

Query: 791 SIEAQIQFGLPSIGGKDSMSGTFEELTVPPTLVAFGVTTADSRKVLSPEFKAaGENIYYI 850 

SIEAQIQ GLPSIGGKDSMSGTFE+LTvPPTIA/AFGVTTADSRKVLSPEFKAAGENIYYI 
Sbjct: 782 SIFAQIQLGLPSIGGKDSMSGTFEDLTVPPTLVAFGVTTADSRKVLSPEFKAAGENIYYI 841 

Query: 851 PGQAI SEDIDFDL I KANFSQFFAIQAQHKITAASAVKYGGVLESLALMTFGNRIGASVE I 910 

PGQAI SEDIDFDLI K NFSQFEAIQAQHKITAASA KYGGVLESIALMTFGNRIGASVEI 
Sbjct: 842 PGQAI SEDIDFDLIKDNFSQFFAIQAQHKITAASAAKYGGVLESIALMTFGNRIGASVEI 901 

Query: 911 AELDSSLTAQLGGFVFTSVEEIADWKIGQTQADFTVTVNGNDLAGASLLSAFEGKLEEV 970 

AELDSSLTAQLGGFVFTS EEIAD VKIGQTQADFTVTVNGNDL&GASLL+AFEGKLEEV 
Sbjct: 902 AELDSSLTAQLGGFVFTSAEEIADAVKIGQTQADFTVTVNGNDLAGASLLAAFEGKLEEV 961 

Query: 971 YPTEFEQVDAIEEVPAWSDWI KAKE 1 1 EKPWY I PVFPGTNSEYDSAKAFEQVGASVN 1030 

YPTEFEQ D +EEVPAWSD VIKAKE IEKPWYIPVFPGTNSEYDSAKAFEQVGASVN 
Sbjct: 962 YPTEFEQTDVLEEVPAWSDTVIKAKETIEKPWYIPVFPGTNSEYDSAKAFEQVGASVN 1021 

Query: 1031 LVPFVTLNFAAIAESVDTMVANIAKANI I FFAGGFSAADEPDGSAKFIVNILLNEKVRAA 1090 

LVPFVTLNE AIAESVDTMVANIAKANI IFFAGGFSAADEPDGSAKFIVNILLNEKVRAA 
Sbjct: 1022 LVPFVTLl^VAIAESVDTMVi^IAKANIIFFAGGFSAADEPDGSAKFIVNILIiNEKVRAA 1081 

Query: 1091 IDSFIEKGGLIIGICNGFQALVKSGliLPYGNFEEAGETSPTLFYNDANQHVAKMVETRIA 1150 

IDSFIEKGGLIIGICMGFQALVKSGLLPYGNFBEAGEI'SPTLFYNDAMQHVAKMVETRIA 
Sbjct: 1082 I DS F IEKGGLI IGICNGFQALVKSGLLPYGNFEEAGETSPTLFYNDANQHVAKMVETRIA 1141 

Query: 1151 NTNSPWLAGVEVGDIHVIPVSHGEGKFWSASEFAELRDNGQIWSQYVDFDGQPSMDSKY 1210 

NTNSPWliAGVEVGDIH IPVSHGEGK WSASEFAELRDNGQIWSQYVDFDGQPSMDSKY 
Sbjct: 1142 NTNSPWLAGVEVGDIHAIPVSHGEGKLWSASEFAELRDNGQIWSQYVDFDGQPSMDSKY 1201 

Query: 1211 NPNGSVKAIEGITSKNGQIIGKMGHSERWEDGLFQNIPGNKDQKLFESAVKYFTGK 1266 

NPNGSVMAIEGITSKNGQIIGKMGHSERWEDGLFQNIPGNKDQ LF SAVKYFTGK 
Sbjct: 1202 NPNGSVMAIEGITSKNGQIIGKMGHSERWEDGLFQNIPGNKDQILFASAVKYFTGK 1257 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 305 

A DNA sequence (GBSx0334) was identified in S.agalactiae <SEQ ID 983> which encodes the amino acid 
45 sequence <SEQ ID 984>. This protein is predicted to be phosphoribosylaminoimidazole- 
succinocarboxamide synthase (purC). Analysis of this protein sequence reveals the following: 
Possible site: 41 

»> Seems to have no K-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 4783 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA03540 GB:L15190 SAICAR synthetase [Streptococcus pneumoniae] 
Identities = 183/231 (79%) , Positives = 203/231 (87%) 



Query: 1 MTNQLIYTGKAKDIYSTKDENVIRTVYKDQATMIjNGARKETIDGKGAIiI^QISSLIFEKL 60 
60 M+ QLIY+GKAKDIY+T+DEN+ 1 + YKDQAT NG +KE I GKG LNNQISS IFEKL 

Sbjct: 1 MSKQLIYSGKAKDIYTTEDENLIISTYKDQATAFNGViC<EQIAGKGVIWNQISSFIFEKL 60 
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Query: 61 NMRGVVTHYIEQISK^Q^KKTOIIPLEVVLRNVTAGSFSKRFGVEEGHVLETPIVEFY 120 

N AGV TH++E++S EQLNKKV IIPLEWLRN TAGSFSKRFGV+EG LETPIVEFY 
Sbjct: SI NAAGVATHFVEKLSDTEQLNKKVKIIPLEWLRNYTAGSFSKRFGVDEGIALETPIVEFY 120 

Query: 121 YH^raJTOPFINDEHVKFLGIViroEEIAYLKGETRHINELLKDWFAQIGKNLIDFKLEFG 180 

YKND+L+DPFINDEHVKFL I +D++IAYLK E R INELLK WFA+IGL LIDFKLEFG 
Sbjct: 121 YKNDDLDDPFINDEHVKFLQIADDQQIAYLKEEARRINELLKVWFAEIGLKLIDFKLEFG 180 

Query: 181 FDKDGKIILADEFSPDNCRLWDADGNHMDKDVFRRDLGSLTDVYQWLEKL 231 

FDKDGKIILADEFSPDNCRLWDADGNHMDKDVFRR LG LTDVY++V EKL 
Sbjct: 181 FDKDGKI ILADEFS PDNCRLWDADGNHMDKDVFRRGLGELTDVYE I VWEKL 231 

A related DNA sequence was identified in S.pyogenes <SEQ ID 985> which encodes the amino acid 
sequence <SEQ ID 986>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 221/234 (94%), Positives = 228/234 (96%) 

Query: 1 MraQLIYTGKAKDIYSTKDEOTIRTVYKDQATMIiNG 60 

+TNQLIY GKAKDIYSTKDENVlRTVYKDQATMLNGARKETIDGKGiALNNQISSLIFEKL 
Sbjct: 11 VTNQLIYKGKAKDIYSTKDEOTIRTWKBQATMIJSGARKETIDGKGAI^QISSLIFEKL 70 

Query: 61 N^GVOTHYIEQISKNEQLNKKOTIIPLEVvLRWTAGSFSKRFGTOEGHVLETPIVEFY 120 

N AGVOTHYIEQISKNEQLNKKTOIIPLEVVLRWrAGSFEKRFGVEEGHVIjETPIVEFY 
Sbjct: 71 NKAGVVTHYIEQISKNEQLNKKVDIIPI^VVLRISIVTAGSFSKRFGVEEGHVLETPIVEFY 130 

Query: 121 YKNDNIMPFINDEHVKFLGIVNDEEIAYLKGETRHINELLKDWFAQIGLNLIDFKLEFG 180 

YKND+L+DPFINDEHVKFLGIVNDEEIAYLKGETR INELLK WFAQIGLNLIDFKLEFG 
Sbjct: 131 YKNDDLDDPFINDEHVKFLGIVNDEEIAYLKGETRRINELLKGWFAQIGLNLIDFKLEFG 190 

Query: 181 FDKDGKIILADEFSPDNCRLWDADGNHMDKDVFRRDLGSLTDVYQVVLEKLIAL 234 

FD++G IILADEFSPDNCRLWD +GNHMDKDVFRRDLG+LTDVYQWLEKLIAL 
Sbjct: 191 FDQEGTIILADEFSPDNCRLWDKNGNHMDKDVFRRDLGNLTDVYQWLEKLIAL 244 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 306 

A DNA sequence (GBSx0335) was identified in S.agalactiae <SEQ ID 987> which encodes the amino acid 
sequence <SEQ ID 988>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2779 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9457> which encodes amino acid sequence <SEQ ID 945 8> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAC35700 GB:AF041468 acyl carrier protein [Guillardia theta] 
Identities = 27/75 (36%) , Positives = 52/75 (69%) 

Query: 12 MSRDEVFEKMLELLRQQLGDPQLDITPESSLHDDI^IDSIALTEFIINLEDVFHLEIPDE 71 
5 M+ E+FEK+ ++ +QLG + +T +++ +DL DS+ E ++ +E+ F++EIPD+ 

Sbjct: 1 MNEQEIFEKVQTIISEQLGVDKSQVTKDANFAMDLGADSLDTVELVMAIEEAFKIEIPDD 60 

Query: 72 AVEHMSSVQQLLDYI 86 
A E +S++QQ +D+I 
10 Sbjct: 61 AAEQI SNLQQAVDFI 75 

A related DNA sequence was identified in S.pyogenes <SEQ ID 989> which encodes the amino acid 
sequence <SEQ ID 990>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
15 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1917 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 36/77 (46%) , Positives = 57/77 (73%) 

25 Query: 12 MSRDEVFEKMLELLRQQLGDPQLDITPESSLHDDIAIDSIALTEFIINLEDVFHLEIPDE 71 

M+R E+FE+++ L+++Q + IT ++ L +DLA+DSI L EFIIN+ED FH+ IPDE 

Sbjct: 1 MTRQEIFERIiINLIQKQRSyLSVAITEQTHLKNDLAVDSIELVEFIINVEDEFHIAIPDE 60 

Query: 72 AVEHMSSVQQLLDYI IE 88 
30 VE M ++ +LDY+++ 

Sbjct: 61 DVEDMVFMRD I LD YLVQ 77 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 307 

A DNA sequence (GBSx0336) was identified in S.agalactiae <SEQ ID 991> which encodes the amino acid 
sequence <SEQ ID 992>. This protein is predicted to be fatty acid/phospholipid synthesis protein (plsX). 
Analysis of this protein sequence reveals the following: 

Possible site: 21 
40 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 101 - 117 ( 101 - 117) 

Final Results 

bacterial membrane — Certainty=0. 1256 (Affirmative) < suco 

45 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9455> which encodes amino acid sequence <SEQ ID 9456> 
was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13462 GB:Z99112 alternate gene name: ylpD [Bacillus subtilis] 
Identities = 174/329 (52%), Positives = 238/329 (71%), Gaps = 2/329 (0%) 

Query: 8 KIAIDAMGGDYAPKAIVEGVNQAISDFSDIEVQLYGDQKKIEKYLTVT-ERVSIIHTEEK 66 
55 +IA+DAMGGD+APKA+++GV +1 F D+ + L GD+ IE +LT T +R++++H +E 

Sbjct: 2 RIAVDAMGGDHAPKAVIDGVIKGIEAFDDLHITLVGDKTTIESHLTTTSDRITVLHADEV 61 
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Query: 67 INSDDEPAKAVRRKKQSSMVLGAKAVKCGVAQAFI SAGNTGALLA&GLFWGRI KGVDRP 126 

I DEP +AVRRKK SSMVL A+ V + A A ISAGNTGAL+ AGLF+VGRI KG+DRP 
Sbjct: 62 IEPTDEPVRAVRRKKNSSMVLMAQEVAENRADACI SAGNTGALMTAGLFIVGRI KGIDRP 121 

Query: 127 GLMSTMPTLDGVGFDMLDLGANAENTASHLHQYAILGSFYAK1WRGIEVPRVGLLNNGTE 186 

L T+PT+ G GF +LD+GAN + HL QYAI+GS Y++ VRG+ PRVGLLN GTE 
Sbjct: 122 AIAPTLPTVSGDGFLLLDVGANVDAKPEHLVQYAIMGSVYSQQVRGVTSPRVGLLNVGTE 181 

Query: 187 ETKGDSLHKEAYELLAAEPSINFIGNIEARDLMSSVADVVVTDGFTGWAVLKTMEGTAMS 246 

+ KG+ L K+ +++L +INFIGN+EARDL+ VADWVTDGFTGN LKT+EG+A+S 
Sbjct: 182 DKKGNELTKQTFQILKETANINFIGNVEARDLLnDVADVWTDGFTGNVTLKTLEGSALS 241 

Query: 247 IMGSLKSSIKSGGVKAKLGALLLKDSLYQLKDSMDYSSAGGAVLFGLKAPIWCHGSSDS 306 

I ++ + + + +KL A +LK L ++K M+YS+ GGA LFGLKAP++K HGSSDS 
Sbjct: 242 IFiCMMR-DVMTSTLTSKIAAAVLKPKLKEMKMKMKYSNYGGASLFGLKAPVIKAHGSSDS 300 



Query: 307 KAVYSTLKQVRTMLETQWDQLVDAFTDE 335 
AV+ ++Q R M+ V + + +E 
20 Sbjct: 301 NAVFHAIRQAREMVSQNVAALIQEEVKEE 329 

A related DNA sequence was identified in S.pyogenes <SEQ ID 993> which encodes the amino acid 
sequence <SEQ ID 994>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
25 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.07 Transmembrane 121 - 137 ( 120 - 138) 



Final Results 

bacterial membrane Certainty=0. 1829 (Affirmative) < suco 

30 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 



A related sequence was also identified in GAS <SEQ ID 9127> which encodes the amino acid sequence 

<SEQ ID 9128>. Analysis of this protein sequence reveals the following: 

35 Possible cleavage site: 16 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.07 Transmembrane 95 - 111 ( 94 - 112) 

Final Results 

40 bacterial membrane --- Certainty= 0 . 183 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

45 Identities = 254/330 (76%) , Positives = 290/330 (86%) 





6 


MKKIAIDAMGGDYAPKAIVEGVNQAISDFSDIEVQLYGDQKKIEKYLTVTERVSIIHTEE 


65 






MK+IAIDAMGGD APKAIVEGVNQAI FSDIE+QLYGDQ KI YL ++RV+IIHT4E 




Sb j ct : 


27 


MKRIAIDAMGGDKAPKAIVEGVNQAIEAFSDIEIQLYGDQTKINSYLIQSDRVAIIHTDE 


86 




66 


KINSDDEPAKAVPJIKKQSSIWLGAKAVITOVAQAFISAGNTGALIAAGLFVVGRIKGVDR 


125 






KI SDDEPAKAVRRKK++SMVL AKAVK+G A A ISAGNTGALLA GLFWGRI KGVDR 




Sbjct: 


87 


KIMSDDEPAKAVPJRKKKASrWIAAIQVVKEGKADAIISAGNTGALLAVGLFWGRIKGVDR 


146 


Query: 


126 


PGLMSTMPTLDGVGFDMIDLGANAENTASHLHQYAIMSFYAKNVRGIEVPRVGIiNNGT 


185 






PGL+ST+PT+ G+GFDMLDLGANAENTA HLHQYAILGSFYAKNVRGI PRVGLLNNGT 




Sbjct: 


147 


PGLLSTIPTVTGLGFDMLDLGANAENTAKHLHQYAILGSFYAKNTOGIANPRVGLLNNGT 


206 




186 


EETKGDSLHKEAVELLAAEPSINFIGNIEARDIjMSSvADVVVTDGFTGNAVLKT^^ 


245 






EETKGD L K YELL A+ + 1 + F+GN+EAR+LMS VADV+V+DGFTGNAVLK+ +EGTA+ 




Sbjct: 


207 


EETKGDPLRKATYELLTADNTISFVG.M\'EARELKSG\7ADVIVSDGFTGNAVLKSIEGTAI 


266 



Query: 246 SIMGSLKSSIKSGGVKAKLGALLLKD3LYQLKDSMDYSSAGGAVLFGLKAPIVKCHGSSD 305 
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SIMG LK I SGG+K K+GA LLK SLY++K ++DYSSAGGAVLFGLKAP+VK HGSSD 
Sbjct: 267 SIMGQLKQIINSGGIKTKIGASLLKSSLYEMKKTLDY3SAGGAVLFGLKAPWKSHGSSD 326 

Query: 306 SKAVYSTLKQVRTMLETQWDQLVDAFTDE 335 
5 KA++ST+KQVRTML+T W QLV+ F E 

Sbjct: 327 VKAI FSTI KQVRTMLDTNWGQLVEEFAKE 356 

Based on this analysis, it was predicted that these proteins and fhejr epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 308 

A DNA sequence (GBSx0337) was identified in S.agalactiae <SEQ ID 995> which encodes the amino acid 
sequence <SEQ ID 996>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4668 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 309 

A DNA sequence (GBSx0338) was identified in S.agalactiae <SEQ ID 997> which encodes the amino acid 
sequence <SEQ ID 998>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

> Seems to have no N-terminal signal sequence 



INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -3 . 



Transmembrane 
Transmembrane 



Transmembrane 267 - 



Final Results 

bacterial membrane Certainty=0. 6137 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9453> which encodes amino acid sequence <SEQ ID 9454> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA22372 GB:AL034446 putative transmembrane protein 
[Streptomyces coelicolor A3 (2) ] 
Identities = 47/154 (30%) , Positives = 69/154 (44%) , Gaps = 12/154 (7%) 

Query: 120 SGFVEISSSNSFSFGPFFFLFLAYFIQSLTEEILFRGYVMTr\TKFKGSFAGVLCNSMLF 179 

SG+ E+ S F+A + TEE++FRG + +4- G++ + ++F 

Sbjct: 118 SGYYEVDGLGSVQGAIGLVGFMA- -AAAATESWFRGVLFRI IESHIGTYLftLGLTGLVF 175 



Query: 180 SFIHFRN YGITAIALFNLFLLGIIFSILFNMTKNILFVTGVHTTWNFTMGCVLGN 234 

+H N +G AIA+ F+L ++ T+N+ GVH WNF G V 
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Sbjct: 176 GLMHLUSEDATLWGAIAIAIEAC3FMrAAAYAA TRNLWLTIGVHFGWNFAAGGVFST 231 

Query: 235 KVSGGDSPVSLFRITENSSFALWNGGDFGFEGGV 268 

VSG L T S L GGDFG EG V 

Sbjct: 232 WSGNGDSEGLLDAT-MSGPKLLTGGDFGPEGSV 264 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 310 

A DNA sequence (GBSx0339) was identified in S.agalactiae <SEQ ID 999> which encodes the amino acid 
sequence <SEQ ID 1000>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 2665 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 945 1> which encodes amino acid sequence <SEQ ID 9452> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05088 GB:AP001511 unknown conserved protein [Bacillus halodurans] 
Identities = 81/242 (33%), Positives = 124/242 (50%), Gaps = 3/242 (1%) 

Query: 8 GLVLYNRNyREDDKI,VKIFTETBGKRMFFVKHAS--KSKFNAVLQPLTIAHFILKINDNG 65 

G+V+ +Y E +K+V +FT GK + A KS+ AV Q T + + N G 

Sbjct: 7 GIVIRTVDYGESNKIVTVFTREYGKIALMARGAKRPKSRLTAVTQLFTYGMMMFQKNA-G 55 

Query: 66 LSYIDDYKEVLAFQETNSDLFKLSYASYITSLABVA1SDNVADAQLFIFLKKTLELIEDG 125 

L + + + +F+E +DLF+ SY SY+T L + D + LF L +T+ + +G 
Sbjct: 66 LGTLTQGEIIQSFREVRNDLFRASYVSYVTDLTNKLTEDEKRNPYLFELLYQTIHYMNEG 125 

Query: 126 LDYEILTNIFEVQLLERFGVALNFHDCVFCHRVGLPFDFSHKYSGLLCPNHYYKDERRNH 185 

+D ++LT IFEV++ G+ CV C +P FS K +G LC KD 

Sbjct: 126 MDPDVLTRIFEVKMFTVAGIKPELDQCVSCRSTDVPVGFSIKEAGFLCKRCIEKDPHAYK 185 

Query: 186 LDPNMLYLINRFQSIQFDDLQTISVKPEMKLKIRQFLDMIYDEYVGIHLKSKKFIDDLSSWG 247 

+ + L+ F L TIS+KPE K ++ + YDEY G+HLKS++F+D L S G 

Sbjct: 186 ITAQVAKLLRLFYHFDLQRLGTISLKPETKATLKTIIHQYYDEYSGLHLKSRRFLDQLESMG 247 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1001> which encodes the amino acid 
sequence <SEQ ID 1002>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certalnty=0. 1566 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 159/251 (63%) , Positives = 210/251 (83%) 
Query: 1 mVSQTYGLVLYNRNYREDDKLVKIFTETEGKRMFFVKHASKSKFNAVLQPLTIAHFILK 60 
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M+++++ G+VL+NRNYREDDKLVKIFTE GK+MFFVKH S+SK +++4QPLTIA FI K 



Sbjct: 


1 


MQLTESLQIVLFNRNYREDDKLVKIFTEVAGKQMFFVKHISRSKMSSIIQPLTIADFIFK 


SO 


Query: 


61 


INDNGLSYIDDYKEVXAFQETNSDLFKLS^ASYITSIADVAISDM/ADAQLFIFLKKTLE 


120 
120 






4-ND GLSY+ DY V ++ N+D+F+L+YASY+ +LAD AI+DN +D+ LF FLKKTL+ 




Sbjct: 


61 


IM5TGLSYYVDYSNVNTYRYINNDIFRLAYASYVLALADAAIADNESDSH 


120 


Query: 


121 


LIEDGLDYEXLTNIFEVQLiLERFGVAIiWFHDCVFCHRVGLPFD?SH:vYSGLLCFNHYYKC' 


ISO 






L+E+GLDYEILTNIFE+Q+L+RFG++LNFH+C CHR LP DFSH++S +LC HYYKD 




Sbjct: 


121 


LMEEGLDYEILTNIFEIQILDRFGISLNFHECAICHRTDLPLDFSHRFSAVLCSEHYYKD 


180 


Query: 


181 


ERRNHLDPNMLYLINRFQS1QFDDLQTISVKPEMKLKIRQFLDMIYDEYVGIHLKSKKFI 


240 






RRNHLDPN++YL++RFQ I FDDL+TIS+ ++K K+RQF+D +Y +YVGI LKSK FI 




Sbjct: 


181 


NRRNHLDPNVIYLLSRFQKITFDDLRTISLHKDIKKKLRQFIDELYHDYVGIKLKSKTFI 


240 




241 


DDLSSWGSIMK 251 








D+L WG IMK 




Sbjct: 


241 


DNLVKWGDIMK 251 





20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 311 

A DNA sequence (GBSx0340) was identified in S.agalactiae <SEQ ID 1003> which encodes the amino 
acid sequence <SEQ ID 1004>. This protein is predicted to be aromatic amino acid aminotransferase 
25 (patA). Analysis of this protein sequence reveals the following: 
Possible site: 14 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.13 Transmembrane 141 - 157 ( 140 - 159) 

30 Final Results 

bacterial membrane — Certainty=0 .2253 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 9449> which encodes amino acid sequence <SEQ ID 9450> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF06954 GB:AF146529 aromatic amino acid aminotransferase 
[Lactococcus lactis subsp. cremoris] 
40 Identities = 261/391 (66%) , Positives = 323/391 (81%) 



Query: 38 MTLEKRFNKYLDRIEVSLIRQFDQSISDIPGMVKLTLC-EPDFTTPDHVKEAAKSAIDANQ 97 

M L K+FN LD+IE+SLIRQFDQ +S IP ++KLTLGEPDF TP+HVK+A +AI+ NQ 
Sbjct: 1 MDLLKKFNPNLDKIEISLIRQFDQQVSSIPDIIKLTLGEPDFYTPEHVKQAGIAAIENNQ 60 

Query: 98 SYYTGMSGLLALRQAAADFAKDKYNLTYNPDCEILVTIGATEALSASLIAILEAGDVVLL 157 

S+YTGM+GLL LRQAA++F KY L+Y + EILVT+G TEA+S+ L++IL AGD VL+ 
Sbjct: 61 SHYTGMAGLLELRQAASEFLLKKYGLSYAAEDEILVTVGVTEAISSVLLSILVAGDEVLI 120 

Query: 158 PAPAYPGYEPIvmVGADIVEIDTRENDFRLTPEMLETAIIQQGEKLKAVLLNYPTNPTG 217 

PAPAYPGYEP++ L G +VEIDTR NDF LTPEML+ AII++ K+KAV+LNYP NPTG 
Sbjct: 121 PAPAYPGYEPLITLAGGSLVEIDTRANDFVLTPEMLDQAIIEREGKVKAVILNYPANPTG 180 

Query: 218 ITYSRQEIAALAEVLKKYDIFVISDEVYSELTYTGQQHVSIAEYLPNQTILINGLSKSHA 277 

+TY+R++I L&EVLKK+4+FVI+DEVYSEL YT Q HVSIAEY P QTI++NGLSKSHA 
Sbjct: 181 VTYlTOEQIKDl^VLKKHEVFVIADETOSEU^/TDQPHVSIASYAPEQTIVljNGLSKSHA 240 



Query: 278 MTGWRVGLVYAPEAFIAQIIICSHQYMvTAASTISQFAGVEALSVGKNDTLPMRQGYIKRR 337 
MTGWR+GL++A +AQI IK+HQY+VT+AST SQFA +EAL G +D LPM++ Y+KRR 
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Sbjct: 241 MTC3WRIGLIFAARELVAQIIKTHQYL\^SASTQSQFAAIEALKN6flDDALPMKKEYLKRR 300 

Query: 338 DyilDKMSKLGFKIIKPSGRFYIFAKIPDSYPQDSFKFCQDFAYQQAVAIIPGVAFGKYG 397 

DYII+KMS LGFKII+P GAFYIFAKIP QDSFKF DFA + AVAI IPG+AFG+YG 
Sbjct: 301 DYIIEKMSALGFKIIEPDGAFYIFAKIPADLEQDSFKFAVDFAKENAVAIIPGIAFGQYG 3S0 

Query: 398 EGYIRLSYAASMEVIETAMARLKVFMESYEG 428 

EG++RLSYAASM+VIE AMARL ++ G 
Sbjct: 361 EGFWLSYAASMDVIEQAMARLTDYVTKKRG 391 

There is also homology to SEQ ID 1006. 

SEQ ID 1004 (GBS332) was expressed in E.coli as a His-fbsion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 60 (lane 3; MW 50.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 67 (lane 4; MW 76kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 312 

A DNA sequence (GBSx0341) was identified in S.agalactiae <SEQ ID 1007> which encodes the amino 
acid sequence <SEQ ID 1008>. This protein is predicted to be ribose-phosphate pyrophosphokinase (prsA). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3118 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9447> which encodes amino acid sequence <SEQ ID 9448> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 10 LKLFALSSNKELAKKVSQTIGIPLGQSTVRQFSDGEIQVNIEESIRGHHVFILQSTSSPV 69 

LK+F+L+SN+ELA+++++ +GI LG+S+V FSDGEIQ+NIEESIRG HV+++QSTS+PV 
Sbjct: 10 LKIFSLNSNRELAEEIAKEVGIELGKSSVTHFSDGEIQIWIEESIRGCHVYVIQSTSWPV 69 

Query: 70 NDl^MEILIMVDALKRASAESVSVVMPYYGYARQDRKARSREPITSKLVANMLEVAGVDR 129 

N NLME+L IM+DALKRASA ++++VMPYYGYARQDRKARSREPIT+KLVAN++E AG R 
Sbjct: 70 NQNLMELLIMIDALKRASAATINIVMPYYGYARQDRKARSREPITAKLVANLIETAGATR 129 

Query: 130 LLTVDLHAAQIQGFFDIPVDHLMGAPLIADYFDRQGLVGDDVVWSPDHGGVTRARKLAQ 189 

++T+D+HA QIQGFFDIP+DHL L++DYF + L GDD+ WVS PDHGGVTRARK+A 
Sbjct: 130 MITLDMHAPQIQGFFDIPIDHLNAVRLLSDYFSERHL-GDDLVWSPDHGGVTRARKMAD 188 

Query: 190 CLKTPIAIIDKl^SWKIWSEVMNIIGNlKGKKCILIDDMIDTAGTICHAAnAIAEAGA 249 

LK PIAIIDKRR + N +EVMNI+GN++GK CI+IDD+IDTAGTI AA AL EAGA 
Sbjct: 189 RLKAPIAI IDKRR- - PRPNVAEVMNIVGNVEGKVCI I IDDI IDTAGTITLAAKALREAGA 246 

Query: 250 TAVYASCTHPVLSGPALDNIQNSAIEK1IVLDTIYLPEERLIDKIEQISIAELIGEAIIR 309 

T VYA C+HPVLSGPA+ 1+ S IEKL+V ++I LPEE+ IDK+EQ+S+A L+GEAI+R 
Sbjct: 247 TKWACCSHPVLSGPAMKRIEESPIEKL\nmrSIAL?EEKKIDKMEQI.SVAALLGEAIVR 306 

Query: 310 IHEKRPLSPLFE 321 



WO 02/34771 



PCT/GB01/04789 



-397- 



Sbjct: 307 VHENASVSSLFE 318 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1009> which encodes the amino acid 
sequence <SEQ ID 1010>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 2685 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 298/322 (92%) , Positives = 311/322 (96%) 

15 





1 


MEEIMSYSNLKLFALSSNKELAKKVSQTIGIPLGQSTVRQFSDGEIQVNIEES1RGHHVF 


60 






+EE MSYS+LKLFALSSNKELA+KV+ 4-GI LG+STVRQFSDGEIQVNIEESIRGHHVF 




Sbjct: 


1 


LEEKMSYSDLKLFALSSNKEIoAEKVASAMGIQLGKSTVRQFSDGEIQVNIEESIRGHHVF 


60 




61 


ILQSTSSPVNDlSnjMEILIMVDALKRASAESVSVVMPYYGYARQDRKARSREPITSKLVAN 


120 






ILQSTSSPVNDNLMEILIMVDALKRASAE +SVVMPYYGYARQDRKARSREPITSKLVAN 




Sbjct: 


61 


ILQSTSSPVNDNLMEILIMVDALKRASAEKISVVMPYYGYARQDRKARSREPITSKLVAN 


120 



Query: 121 MLEVAGVDRLLTVDLHAAQIQGFFDIPVDHLMGAPLIADYFDRQGLVGDDVVWSPDHGG 180 

MLEVAGVDRLLTVDLHAAQI QGFFD I PVDHLMGAPL1ADYFDR GLVG+DVWVSPDHGG 
Sbjct: 121 MLEVAGVDRLLTVDLHAAQIQGFFDIPVDHLMGAPLIADYFDRHGLVGEDVVWSPDHGG 180 

Query: 181 VTRARKLAQCLKTPIAI IDKRRSVTKMNTSEVMNIIGNIKGKKCILIDDMIDTAGTICHA 240 

VTRARKLAQ L+TPIAIIDKRRSV KMNTSEVMNI IGN+ GKKC1L1DDMIDTAGTICHA 
Sbjct: 181 VTRARKIAQFLQTPIAIIDKRRSVDKMNTSEVMNIIGNVSGKKCIIilDDMIDTAGTICHA 240 

Query: 241 ADAIAEAGATAvYASCTHPVLSGPALDNIQNSAIEKLIVLDTIYLPEERLIDKIEQISIA 300 

ADALAEAGATAVYASCTHPVLSGPALDNIQ SAIEKLIVLDTIYLP+ERLIDKIEQISIA 
Sbjct: 241 ADALAEAGATAVYASCTHPVLSGPALDNIQRSAIEKLIVLDTIYLPKERLIDKIEQIS1A 300 

Query: 301 ELIGEAI IRIHEKRPLSPLFEM 322 

+L+ EAIIRIHEKRPLSPLFEM 
Sbjct: 3 01 DLVAEAI IRIHEKRPLSPLFEM 322 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 313 

A DNA sequence (GBSx0342) was identified in S.agalactiae <SEQ ID 101 1> which encodes the amino 
acid sequence <SEQ ID 1012>. This protein is predicted to be a secreted protein. Analysis of this protein 
45 sequence reveals the following: 

Possible site: 20 

>» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3751 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9277> which encodes amino acid sequence <SEQ ID 9278> 
55 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAD00288 GB:U7B607 putative secreted protein [Streptococcus mutans] 
Identities = 111/157 (70%), Positives = 130/157 (82%), Gaps = 1/157 (0%) 

Query: 1 MTAIKGQVGALESQQSELEAQNAQLEAVSQQLGQEIQTLSNKIVAEHESLKKQVRSAQKG 60 

+ I+GQV AL4+QQ+EL+A+N +LEA S LGQ+IQTLS+KIVARNESLK+Q RSAQK 
Sbjct: 55 LITIQGQVSALQTQQAELQAENQRLEAQSATLGQQIQTLSSKIVAENESLKQQARSAQKS 114 

Query: 61 NL-TNYIOTIIiNSI<SVSDAVNRWAIREWSANEKMIAQQEADKAALEAKQIENQNAINT 119 

N T+YIN I+NSKSVSDA+NRV AIREWSANEKML QQE DKAA+E KQ ENQ AINT 
Sbjct: 115 NAATSYINAIINSKSVSDAINRVSAIREWSANEKMLQQQEQDKAAVEQKQQENQAAINT 174 

Query: 120 VAANKQAIENNKAALATQRAQLEAAQLELSAQLTTVQ 156 

VAAN++ I N AL TQ+AQLEAAQL L A+LTT Q 
Sbjct: 175 VAANQETIAQNTNALNTQQAQLEAAQLNLQAELTTAQ 211 

There is also homology to SEQ ID 1014. 

A related GBS gene <SEQ ID 8543> and protein <SEQ ID 8544> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 8.29 
GvH: Signal Score (-7.5): 0.8 

Possible site: 49 
>>> Seems to have a cleavable N-term signal seg. 
ALOM program count: 0 value: 6.74 threshold: 0.0 
PERIPHERAL Likelihood = 6.74 400 
modified ALOM score: -1.85 



Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) - 

bacterial membrane — Certainty=0.0000 (Not Clear) < ( 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) 

The protein has homology with the following sequences in the databases: 

32.8/56.3% over 439aa 



;p 1 512521 1 usp 45 Insert c 



Lactococcus lactis 



PIR|ON0097| JN0097 secreted 45K protein precursor - Insert characterized 
ORF00094(301 - 1563 of 1941) 

GP|51252l|emb|CAA01320.l| |A17083 (1 - 440 of 461) usp 45 {LactOCOCCUS 

lactis}PIR| JN0097|ON0097 secrete 

d 45K protein precursor - Lactococcus lactis 

%Match =16.5 

%Identity =32.8 %Similarity =56.3 

Matches = 141 Mismatches = 178 Conservative Sub.s = 101 

93 123 153 183 213 243 273 303 

RKYYNFKSNYTLFLFLF*FHYGVIILIE*IEEGYRFLDLIMVHLEIVDFKYKCNNDVI*FREFFGKIFNVLS*RSSLIKM 



KKRILSAVLVSGVTLGTAA--VTWADDFDSKIAATDSVIOTLSGQQAAAQNQVTAIKGQVGALESQQSELEAQNAQLEA 
11=1=11=1=1 I I II I II =1 II 1= 1== =1 II II === =1 =1= =1= :|| |::|: 
KKKIISAIMSTVILSAARPLSGVYAD-INSDIAKQDATISSAQSAKAQAQAQVDSLQS[C\'DSLQQKQTSTKAQIAKIES 



VSQQLGQEIQTLSNKIVARIffiSLKKQVRSAQ-KG^TOTICTI]^ 

:: I =1 11= I I ==l= 1 Mil = 111== ==llll==l = =1 II I lll=:|| ||| :: | 
FAKAIMAQIATLNESiraRTKTLEAQARSAQVNSSATNYMDAVTO 
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804 834 864 894 924 954 984 1014 

EAKQIENQNAINTVAANKQAIEM^KMLATQRAQLEAAQLELSAQLTWQNEKASLIQAKAQAEEAARKAAEAQAAAEAK 
1 = I : |:::= I =1=1=1= II 1=1 l==l =1= II ll=ll==ll= III 11= 
5 SQKSETVKKTTCNQFVSLSQSLDSQAQELTSQQAELKVATIj^ 

170 180 190 200 210 220 230 240 

1044 1065 1095 1125 1155 1185 1215 

AQAEAKAQAESVA- - - KAQAAAQVESATAPTETVQTQPRTE I KPSNLTATSSATTVATTTATATNEPKVTQPSWTKA- - 
10 : hill II II II == = == I 1= II =l==l= = = ======= =1=1 

250 260 270 280 290 300 310 320 

1266 1296 1326 1347 1374 1401 1455 

15 -VEAPKAWSSTPRAVSKPWRSYDSSNTYPMGQCT WGA-KSMASWGNYW-GNANQWGASARAAG--YSVGTTPRV 

: = == =1 I I I II I == I II II == I I = II I I 

mWSGTSTGNTGGTTTGGSGINSSPIGNPyAGGGCTDYTOQYFAAQGIYIRNIMPGNGGQWASNGPAC^VLHWGAAPGV 
330 340 350 360 370 380 390 400 

20 1503 1533 1563 1593 1623 1653 1683 

GAVAVWP YDGGGYGHVAWTSVANNSS IQVMESNYAGNMS IGNYRGSFNPSASGSVYYIYPN* * IIJIRSFWSFLF 

I = I 11111=1 II :: =1 = I I I = 

IASSFSADFVGYANSPYGHVAIVKSVNSDGTITIKEGGYGTTWWGHERTVSASGVTFLMPN 
410 420 430 440 450 460 

25 

SEQ ID 8544 (GBS65) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 6; MW 47.5kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 13 (lane 3; MW 72kDa) and in Figure 
175 (lane 2 & 3; MW 72kDa). 

30 The GBS65-GST fusion product was purified (Figure 102A; see also Figure 191, lane 4) and used to 
immunise mice (lane 1 product; 20ng/mouse). The resulting antiserum was used for Western blot (Figure 
102B), FACS, and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 314 

A DNA sequence (GBSx0343) was identified in S.agalactiae <SEQ ID 1015> which encodes the amino 
acid sequence <SEQ ID 1016>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
40 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1184 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 315 

A DNA sequence (GBSx0344) was identified in S.agalactiae <SEQ ID 1017> which encodes the amino 
acid sequence <SEQ ID 1018>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0.473S (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 316 

A DNA sequence (GBSx0345) was identified in S.agalactiae <SEQ ID 1019> which encodes the amino 
acid sequence <SEQ ID 1020>. This protein is predicted to be elongation factor Tu (tufA). Analysis of this 
protein sequence reveals the following: 

a N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 3012 (Affirmative) < suco 

bacterial membrane --- Certainty-0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9737> which encodes amino acid sequence <SEQ ID 9738> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB03851 GB:AP001507 translation elongation factor Tu (EF-Tu) 
[Bacillus halodurans] 
Identities = 302/397 (76%), Positives = 350/397 (88%), Gaps = 2/397 (0%) 

Query: 7 MMCEKYDRSKPHTOIGTIGHVDHGKTM 6 6 

MAKEK+DRSK H NIGTIGHVDHGKTTLTAAITTVLA+R V Y +ID APEER 

Sbjct: 1 ^AKEKFDRSKTHANIGTIGHVDHGKTTLTAAITTVLAKRSGKGVAMA- - YDAIDGAPEER 58 

Query: 67 ERGITIOTAHVEYETEKRHYAHIDAPGHADYVKmiTGAAQMDGaiLWASTDGPMPQTR 126 

ERGITI+TAHVEYET+ RHYAH+D PGHADYVKNMITGAAQMDG ILW++ DGPMPQTR 
Sbjct: 59 ERGITISTAHVEYETDNRHYAHVDCPGHADYVTfliMITGAAQMDGGILVVSAADGPMPQTR 118 

Query: 127 EHILLSRQVGVKHLIVF^INKVDLVDDEELIlELVEMEIRDLLSEYDFPGDDLPVIQGSALK 186 

EHILLSRQVGV +L+VF+NK D+VDDE3LLELVEKE+RDLLSEYDFPGDD+PVI+GSALK 
Sbjct: 119 EHILLSRQVGVPYLWFLNKCDMVDDE3LLELVEMEVRDLLSEYDFPGDDVPVIRGSALK 178 

Query: 187 ALEGDEKYEDI IMELMSTVDEYI PEPERDTDKPLLLPVEDVFSITGRGTVASGRIDRGTV 246 

ALEGD ++E+ I+ELM+ VD+YIP PERDT+KP ++PVEDVFSITGRGTVA+GR++RG + 
Sbjct: 179 ALEGDAEWEEKIIELMARVDDYIPTPERDTEKPFMMPVEDVFSITGRGTVATGRVERGQL 238 

Query: 247 RVNDEVEIVGIKEDIQKAVVTGW.MFRKQLDEGIAGDNVGVLLRGVQRDEIERGQVLAKP 306 

V DEVEI+G++E+ +K VTGVEMFRK LD AGDN+G LLRGV R+E++RGQVLAKP 
Sbjct: 239 NVGDEWIIGLEEEAKKTTVTGVEMFRKLLDYAEAGDNIGAIjLRGVSREEVQRGQVIjAKP 298 
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Query: 307 GSINPHTRFKGEWILSKEEC<3IHTPFFNNYRPQFYFRTTDVTGSIELPAGTEMVMPGDN 366 

G+I PHT FK EW+LSKEEGGRHTPFF+NYRPQFYFRTTDVTG I+LP G EMVMPGDN 
Sbjct: 299 GTITPHTNFKAEVYVLSKEEGGRHTPFFSNYRPQFYFRTTDVTGIIQLPDGVEMVMPGDN 358 

5 Query: 367 VTIEVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIE 403 

V + VELI PIA+E+GT FSIREGGRTVG+G+V+ 1+ 
Sbjct: 359 VEMTVELIAPIAIEEGTKFSIREGGRTVGAGWaSIQ 395 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1021> which encodes the amino acid 
10 sequence <SEQ ID 1022>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 1367 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

20 Identities = 3S6/404 (95%), Positives = 396/404 (97%) 



Query: 1 MEAFPKMAKEKYDRSKPHVNIGTIGHVDHGKTTLTAAITTVLARRLPTSVNQPKDYASID 60 

+FAFPKMAKEKYDRSKPHVNIGTIGHVDHGKTTLTAAITTVIARRLP+SVNQPKDYASID 
Sbjct: 12 LFAFPKMAKEKYDRSKPHWIGTIGHVDHGKTTLTAAITTVIARRLPSSVNQPKDYASID 71 

Query: 61 AAPEERERGITINTAHVEYETEKRHYAHIDAPGHADYVKNMITGAAQMDGAILWASTDG 120 

AAPEERERGITINTAHVEYET RHYAHIDAPGHADWKNMITGAAQMDGAILWASTDG 
Sbjct: 72 AAPEERERGITINTAHVEYETATRHYAHIDAPGHADWKNMITGAAQMDGA1LWASTDG 131 

Query: 121 PMPQTREHILLSRQVGvTOILIVFMNKVDLVDDEELLELVEMEIRDLLSEYDFPGDDLPVI 180 

PMPQTREHILLSRQVGVKHLIVFMNKVDLVDDEELLELVEMEIRDLLSEYDFPGDDLPVI 
Sbjct: 132 PMPQTREHILLSRQVGVKHLIVFMNKVDLVDDEELLELVEMEIRriLLSEYDFPGDDLPVI 191 



Query: 181 QGSALKALEGDEKYEDI IMELMSTVDEYI PEPERDTDKPLLLPVEDVFS ITGRGTVASGR 240 

QGSALKALEGD K+EDIIMELM TVD YI PEPERDTDKPLLLPVEDVFS ITGRGTVASGR 
Sbjct: 192 QGSALKALEGDTKFEDIIMELMDTVDSYIPEPERDTDKPLLLPVEDVFSITGRGTVASGR 251 

Query: 241 IDRGTTOVNDEVEIVGIKEDIQXA\"7TGVEMFRKQLDEGLAGDNVGVLLRGVQRDEIERG 300 

IDRGTVRVNDE+EIVGIKE+ +KAWTGVEMFRKQLDEGLAGDNVG+LLRGVQRDEIERG 
Sbjct: 252 IDRGTWVNDEIEIVGIKEETKKAVVTGVEMFRKQLDEGLAGDNVGILLRGVQRDEIERG 311 



Query: 361 VMPGDNVTIEVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 404 
Sbjct: 372 VMPGDNVTINVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 415 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 317 

A DNA sequence (GBSx0346) was identified in S.agalactiae <SEQ ID 1023> which encodes the amino 
acid sequence <SEQ ID 1024>. Analysis of this protein sequence reveals the following: 

55 Possible site: 36 

»> Seems to have a cleavable N-term signal seg. 

INTEGRAL Likelihood = -0.64 Transmembrane 90 - 106 ( 90 - 106) 

Final Results 

60 bacterial membrane Certainty=0. 1256 (Affirmative) < suco 
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bacterial outside Certainty=D . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 318 

A DNA sequence (GBSx0347) was identified in S.agalactiae <SEQ ID 1025> which encodes the amino 
acid sequence <SEQ ID 1026>. This protein is predicted to be ftsW. Analysis of this protein sequence 
reveals the following: 



Possible site: 38 

>» Seems to have no N-terminal signal sequence 
Likelihood =-11.15 
Likelihood = -4.73 
INTEGRAL Likelihood = -3.88 Transmembrane 



117 



- 60 ( 35 - 70) 

- 92 ( 74 - 98) 

- 133 ( 113 - 134) 



- Final Results 

bacterial membrane Certainty=0 . 5458 (Affirmative) < 

bacterial outside Certainty^O . 0000 (Not Clear) < £ 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < s 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB39929 GB:U58049 putative cell division protein ftsW 
[Enterococcus hirae] 
Identities = 78/159 (49%) , Positives = 107/159 (67%) , Gaps = 4/159 (2%) 

Query: 1 MANSXYAMSNGGWFGRGLGNSIEKLGYLPEATTDFVFSIVIEELGVIGAGFI1ALVPFLI 60 

M+NS YA+ NGG FGRG+GNSI K GYLPE+ TDF+FS++ EE G+IGA +L L+F L 
SbjCt: 240 MSNSYYALYNGGLFGRGMGNSITKKGYLPESETDFIFSVIAEEFGLIGALLVLFLLFLLC 299 

Query: 61 LRIMHVGIlQiKDPFNSMIALGIGAMLLMQVFVNIGGISGLIPSTGVTFPFLSQGGNSLLV 120 

+RI K K+ ++I +G+G +L+Q +NIG I GLIP TGV PF+S GG S L+ 

Sbjct: 300 MRIFQKSTKQKNQQANLILIGVGTWILVQTSIN1GSILGLIPMTGVPLPFVSYGGTSYLI 359 

Query: 121 LSVAIGFVLNIDANEKKELIMKEAEEQYKPQEKNEKIIN 159 

LS A1G LNI + + KE +++ +QK K++N 
Sbjct: 360 LSFAIGLALNISSRQVKE KNKQVERLQLKKPKLLN 394 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1027> which encodes the amino acid 
sequence <SEQ ID 1028>. Analysis of this protein sequence reveals the following: 



Possible site: 51 
» Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood . 
INTEGRAL Likelihood . 
INTEGRAL Likelihood . 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



Transmembrane 22 - 
Transmembrane 192 - 
Transmembrane 218 - 
Transmembrane 86 - 
Transmembrane 385 - 
61 - 



212 - 2361 



383 - 402; 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5373 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAB59721 GB:AJ250603 FtsW protein [Enterococcus faecium) 
Identities = 131/397 (32%) , Positives = 223/397 (55%) , Gaps = 23/397 (5%) 

Query: 15 KRHLLNYSILLPyLHiSVIGLIMVYSTTSVSLIQAHANPFKSVINQGVFWIISLVAITFI 74 

KR +++ IL PYL LS+IGL+ VYS +S L+QA N ++ Q +F +S I 
Sbjct: 3 KRKKIDWWILGPYLTLSMIGLLEWSASS YRLLQADEKTKSLLLRQLI FI FLSWGVI FLA 62 

Query: 75 YKLKLNFLTOTRVLTVVMLGEAFIiIiIIAR--FFTTAIKX3AHGWIVIGPVSFQPAEYLKII 132 

+KL++L + ++ + F LI+ R F + GA WI + + FQP+E + 
Sbjct: 63 RSIKLHYLLHPKIAGYGLALSIFFLILVRVGIFGVTVNGAQRWISLFGIQFQPSELMIIiF 122 

Query: 133 ^mIYIAl,TFAICIQKNISLYDYQALTRRKW^PTQ^^M3LRDt^RVYSLL^WLLVAAQPDLGNA 192 



+++YL+ 



QP + A 



15 Sbjct: 123 LI FYLSWFFRDGNN PPK--NLKKPFLITVSITLLILFQPKIAGA 164 

Query: 193 SIIVLTAIIMFSISGIGYRWFSAILVMITGLSTVFLGTIAVIGVERVAKIP-VFGYVAKR 251 
+1+ A ++F + + ++ ++V + L G + +G + +P +F + +R 

^ Sbjct: 165 IWILSIAWIFWAAAVPFKKGIYLIVTFSALLIGAAGGVLYLGNK--GWLPQMFNHAYER 222 

Query: 252 FSAFFNPFHDLTDSGHQLANSYYAMSNGGWFGQGLGNSIEKRGYLPEAQTDFVFSWIEE 311 

+ +PF D +G+Q+ +S+YA+ NGG +G+GLGNSI K+GYLPE +TDF+FS++ EE 
Sbjct: 223 IATLRDPFIDSHGAGYQMTHSFYALYNGGIWGRGLGNSITKKGYLPETETDFIFSIITEE 282 

25 Query: 312 LGLIGAGFILALVFFLILRIMNVGIKAKNPFN^MALGVGGMMLMQVFYNIGGISGLIPS 371 

LGLIGA +L L+F L +RI + + KN + LG G ++ +Q +N+G I+GL+P 
Sbjct: 283 LGLIGALCVLFLLFSLCMRIFCLSSRCKMQQAGLFLLGFGTLLFVQTIMNVGSIAGLMPM 342 

Query: 372 TGVTFPFLSQGGNSIiLVLSVAVGFVLNIDASEKRDDI 4 08 
30 TGV PF+S GG S L+LS+ +G LNI +■ + +++ 

Sbjct: 343 TGVPLPFVSYGGTSYLILSLGIGITLNISSKIQAEEIi 379 

An alignment of the GAS and GBS proteins is shown below: 

^ Identities = 130/166 (78%) , Positives = 152/166 (91%) , Gaps = 2/166 (1%) 

Query: 1 MANSXYAMSNGGWFGRGLGNSIEKLGYLPEATTDFVFSIV1EELGVIGAGFILALVFFLI 60 

+ANS YAMSNGGWFG+GLGNSIEK GYLPEA TDFVFS+VIEELG+IGAGFILALVFFLI 
Sbjct: 269 LANSYYAMSNGGWFGQGLGNSIEKRGYLPEAQTDF^/FSWIEELGLIGAGFILALVFFLI 328 

40 Query: 61 LRIMHVGIKAKDPFNSMIALGIGAMLLMQVFWIGGISGLIP5TGVTFPFLSQGGNSLLV 120 

LRIM+VGI KAK+ PFN+M+ALG+G M+LMQVFYNIGGISGL1 PSTGVTFPFLSQGGNSLLV 
Sbjct: 329 LRIMOTGIKAKNPFNAMMALGVGGMMLMQVFVKIGGISGLIPSTGVTFPFLSQGGNSLLV 388 

Query: 121 LSVAIGFVLNIDANEKKELIMKEAEEQYK- - PQEKUEKI INLDAFK 164 
45 LSVA+GFVLNIDA+EK++ I KEAE Y+ +++N K++N+ F+ 

Sbjct: 389 LSVAVGFVLNIDASEKRDDIFKEAELSYRKDTRKENSKWNIKQFQ 434 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 319 

A DNA sequence (GBSx0348) was identified in S.agalactiae <SEQ ID 1029> which encodes the amino 
acid sequence <SEQ ID 1030>. This protein is predicted to be probable cell division protein ftsw (ftsW). 
Analysis of this protein sequence reveals the following: 

Possible site: 34 





have an uncleavab 


e N 


-term signal seq 
Transmembrane 










INTEGRAL 


Likelihood = -9 


77 


12 


28 




37) 


INTEGRAL 


Likelihood = -7 


22 


Transmembrane 


76 


92 


74 


97) 


INTEGRAL 


Likelihood =s -6 


53 


Transmembrane 


182 


198 


178 


201) 


INTEGRAL 


Likelihood = -4 


62 


Transmembrane 


51 


67 


46 


69) 


INTEGRAL 


Likelihood = -2 


87 


Transmembrane 


202 


218 


202 


218) 



WO 02/34771 



PCT/GB01/04789 



-404- 



- Final Results - 

bacterial rr 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4906 (Affirmative) < succ; 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9327> which encodes amino acid sequence <SEQ ID 9328> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MKIDKRHLLNYSILIPYLILSILGLIVIYSTTSATLIQLC3ANPFRSVINQGVFWAVSLVA 60 

M ++K + LNYSILIPYLIL+ +G+++I+STT +Q G NP++ VINQ F +S++ 
Sbjct: 1 MNLNKNNFLNYS ILI PYLILAGIGI VMI FSTTVPDQLQKGLNPYKL VTNQTAFVLLSI IM 60 

Query: 61 IIFIYKLKLNFLKNSKVLTMAVLVEVFLLLIARF FTQEVNGAHGWIVIGPI-SF 113 

I IY+LKL LKN K++ + +++ + L+ R T VNGA GWI I I + 

Sbjct: 61 IAVIYRLKLR&LKNRKMIGIIMVILILSLIFCRIMPSSFALTAPWGARGWIHIPGIGTV 120 

Query: 114 QPAEYLKVI IVWYLAFTFARRQKRIEIYDYQALTKGRWLPRSLSDLKDWRFYSLFMIGLV 173 

QPAE+ KV I+WYLA F+ +Q-H+IE D + KG+ L + L WR + ++ + 

Sbjct: 121 QPAEFAKVFIIWYLASVFSTKQEEIEKMDINEIFKGKTLTQKL- -FGGWRLPWAILLVD 178 

Query: 174 IAQPDLGNGSIIVLTVIIM 192 

+ PDLGN II +IM 
Sbjct: 179 LIMPDLGNTMIIGAVALIM 197 



There is also homology to SEQ ID 1028. 

30 A related GBS gene <SEQ ID 8545> and protein <SEQ ID 8546> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 15.18 
GvH: Signal Score (-7.5): -3.58 
35 Possible site: 34 

>» Seems to have an uncleavable N-term signal seq 



ALOM program 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



7 threshold: 

Transmembrane 12 - 28 

Transmembrane 76 - 92 

Transmembrane 210 - 226 

Transmembrane 182 - 198 

Transmembrane 51 - 67 
116 



178 - 201) 



* Reasoning Step: 3 



- Certainty=0. 4906 (Affirmative; 
bacterial outside — - Certainty=0. 0000 (Not Clear) . 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) ■ 



The protein has homology with the following sequences in the databases: 

ORF02700(301 - 876 of 1377) 

EGAD | 8615 18419(1 - 197 of 198) hypothetical protein in rpmg 3 'region , fragment 
{Lactococcus lactis} SP|P27174|YRG2_LACLA HYPOTHETICAL PROTEIN IN RPMG 3'REGION (0RF2) 
(FRAGMENT). GP 1 44069 1 emb | CAA44490 . 1 1 |X62621 ORF2 N-terminal {Lactococcus lactis} 
PIR|PC1134|PC1134 hypothetical protein 198 (rmpG 3' region) - Lactococcus lactis (fragment) 
%Match =15.1 

%Identity =42.3 %Similarity =64.9 

Matches = 82 Mismatches = 64 Conservative Sub.s = 44 
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87 117 147 177 207 237 267 297 

Ka *I*Y*I**L*LVILFLLPFFINFL*IYLTGLHD*HVPSNISN*SFIWISIVGGYXX*LIXXXII«raGNFLKY*RK*Y 

5 327 357 387 417 447 477 507 537 

NMKIDKRHLIOTSILIPYLILSILGLIVIYSTTSATLIQLaai,ipFRSVIICGVFVfAVSLVAIIFIYKLKmFLKNSKVLT 
I ::| ==111111111111= :|:: = |:||| =1 I ll = = I I I I I = l = = I 11 = 111 III h = 
MM^KNNFLNYSILIPYLIIAGIGIVMIFSTWPDQLQKGI^ 

10 20 30 40 50 60 70 

10 

567 585 609 636 666 696 726 756 

MAVLVEVFLLL1ARF FT - - QE VNGAHGWI VIGP I - SFQ PAE YLKVI IVWYLAFTFARRQKKIE I YDYQALTKGRWL 

: : = = -hi I Mil III I I = lllb II 1 = 1111 1 = =1-11 I : 11= I 

IltWILILSLIFCRIMPSSFALTAPVNGARGWIHIPGIGTVQPAEFAKVFIIt'JYLASVFSTKQEEIEKNDINEIFKGKTL 
15 90 100 110 120 130 140 150 

786 816 846 876 906 936 966 996 

PRSLSDLKDWRFYSLFMIGLVIAQPDLGNGSIIVLTVIIMYCISGIGYRWFSALLGLIWGSTLFIGTIAWGVETMAKV 

= i = n= === = = iiiii ii =n 

20 TQKL- - FGGWRLPWAILLVDLIMPDLGNTMI IGAVALIMI 

170 180 190 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 320 

25 A DNA sequence (GBSx0349) was identified in S.agalactiae <SEQ ID 1031> which encodes the amino 
acid sequence <SEQ ID 1032>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0. 3665 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1033> which encodes the amino acid 
sequence <SEQ ID 1034>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0. 2373 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 35/41 (85%) , Positives = 37/41 (89%) 

Query: 1 MEKEAKQI IDLKRNLFKIDVRAQKDEEKVFMRTACCYSPFY 41 
50 +EKEAKQ+IDLKRNLFKIDVRAQKDEEKVFMRTAC S Y 

Sbjct: 1 LEKEAKQMIDLKRNLFKIDVRAQKDEEKVFMRTACRQSRVY 41 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 321 

A DNA sequence (GBSx0351) was identified in S.agalactiae <SEQ ID 1037> which encodes the amino 
acid sequence <SEQ ID 1038>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.65 Transmembrane 73 - 94 ( 78 - 95) 
INTEGRAL Likelihood » -1.33 Transmembrane 421 - 437 ( 420 - 437) 

Final Results 

bacterial membrane — Certainty=0 . 1659 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA00827 GB:A09073 phosphoenol pyruvate carboxylase 
[Corynebacterium glutamicum] 
Identities = 335/958 (34%), Positives - 539/958 (55%), Gaps = 80/958 (8%) 

Query: 22 EIITEEVGLLKQLLDEATQKLIGSESFDKIE-- KIVSLSLTD— DYTGLKETISALSNE 76 

+ + +++ L Q+L E + G E ++ +E 4-+ S + + L + 4-4- 

Sbjct: 3 DFLRDDIRFLGQILGEVIAEQEGQEVYELVEQARLTSFDIAKGNAEMDSLVQVFDGITPA 62 

Query: 77 EMVIVSRYFSILPLLINISEDVDIAYEI1WKNNINQDYLGKLST TIDW 125 

+ +4-R FS LL H++ED+ Y L 4- L T T+D 

Sbjct: 63 KATPIARAFSHFALLANLAEDL YDEELREQALDAGDTPPDSTLDATWLKLNEG 115 

Query: 126 - AGHENAKD I IiEHVl^rTOvLTAHPTQVQRKTVnELTSKIHDLLRKXRDVKAGI VNQ 180 

G E D+L + V PVLTAHPT+ +R+TV + I +R+ +++ 
Sbjct: 116 OTGAEAVADVTjRNAEVAPVLTAHPTETRRRn'FDAQKVIITTHMRERHALQSAEPTARTQS 175 

Query: 181 - -EKWYADLKRYIGI IMQTDTIREKKLKVK^EITlWlffiY^nKSLlKAVTKLTAEYKAIjAA 238 

+4 ++RR I 14- QT IR + ++++EI 4- YY SL++ 4- +4- + 
Sbjct: 17.6 KLDEIEKNIRRRITILWQTALIRVARPRIEDEIEVGLRYYKLSLLEEIPRIfqRDVAVELR 235 

Query: 239 KK---GIHLENPKPLTM-GMWIGGDRDGNPFVTAETLRLSAMVQSEVIINHYIEQLNELy 294 

4+ G+ L KP+ G WIGGD DGNP+VTAET4- S 4-E +4- +Y QL4- h 
Sbjct: 236 ERFGEG VPL KPVVKPGSWIGGDHDGNPYTCAETVEYSTHRAAETVLKYYflRQLHSLE 292 

Query: 295 FJ^SLSII^TEVSPELTCIJWQSQDNSVYRHffiPYRKAFNFIQDKLVQTLLNLKT/GSSPK 354 

+SLS + 4-V+P+L-l- IA4- +4- R 4-EPYR+A 4- 4-4- +4-4- T 
Sbjct: 293 HELSLSDR^INKOTPQLIALAEAGHNDTOSRVDEPYRRAVHGVRGRIIiAT--- 341 

Query: 355 EKFVSRQESSDIVGRYIKSHIAQVASDIQTEELPAYATAEEFKQDLLLVKQSLVQYGQDS 414 

+4-++4-G 4- 4- YA-f EEF D L + SL 4 

Sbjct: 342 TAELIGE DAVEGVWFKVFTPYASPEEFLNDALTIDHSLRESKDVL 386 

Query: 415 LvTJGELACLIQATOIFGFYIaaTIDMRQDSSII^CVAELLKSAMIVDDYSSLSEEEKCQL 474 

4- D L+ LI A4-4- FGF L 4-D+RQ+S E 4- EL 4- A 4- 4-Y LSE EK 4-4- 
Sbjct: 387 IADDRLSVLISAIESFGFI^YALDLRQNSESYEDVLTELFERAQVTANYRELSEAEKLEV 446 

Query: 475 LLKELTEDPRTLSSTHAPKSELLQKEIAIFQTARELKDQLGEDI INQHI ISHTESVSDMF 534 

LLKEL + SE+ 4-EL IF4-TA E 4- G 4-4- IIS SV4-D4- 

Sbjct: 447 LLKELRSPRPLIPHGSDEYSEVTDREU3IFRTASEA\rKKFGPRMVPHCIISMASSVTDVL SOS 

Query: 535 ELAIMLKEVGLIDAN QARIQIVPLFETIEDIOTSRDIMTQYLHYELWKWmraMST 590 

E 4-4-LKE GLI AN 4-4- 4-4-PLFETIEDIi 14- 4- 4-L 4- 4-4- +N 

Sbjct: 507 EPMVLLKEFGLIAANGDNPRGTVDVIPLFSTIEDLQAGAGILDELWKIDLYRNYLLQRDN 566 

Query: 591 YQEIMLGYSDSNKDGGYLSSGWTLYKAQNELTKIGEENGIKITFFHGRGGTVGRGGGPSY 650 

QE+MLGYSDSNKDGGY S4- W LY A+ 4-L 4-4- G+K4- FHGRGGTVGRGGGPSY 
Sbjct: 567 VQEvm ( GYSDSNKDGGYFSA^IWALYDAELQLVELCRSAGVKLRLFHGRGGTVGRGGGPSY 626 
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Sbjct: 627 DAHiAQPRGAVQGSVRITEQGEIISAKYGNPETARRMLEALVSATLE ASLLDVSEL 682 

INFRETMDGIVSESNAV YRNIiVFDNPYFYDYFFEASP I KE VSSLNIGSRPA&RKTI 766 

+ 4 D I+SE + + Y +LV ++ F EOT +++P++E+ SMIGSRP++RK 
•DHQRAYD-IMSEISELSLKKyASLVHEDQGFIDYFTQSTPLQEIGSLNIGSRPSSRKQT 741 

•EISGLRAIPWVFSWSQNRIMFPGWYGVGSAFKHFI - - -EQDFJWLAKLQTMYQKWPFFN 823 
■ + LRAIPW SWSQ+R+M PGN+GVG+A + +1 EQ +A+LQT+- + WPFF 



V+SK+ + +A YA Ii EV + V++-M E+ LTK M 



+NP+L S+ R PY IiK +Q+E+++R R E + I +T+NG++T LRNSG 

DNPLLARSVQRRYPYLLPLNVIQVEMMRRYRKGDQSEQVSRNIQLTMnGLSTALRNSG 919 

A related GBS nucleic acid sequence <SEQ ID 10961> which encodes amino acid sequence <SEQ ID 
10962> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1039> which encodes the amino acid 
sequence <SEQ ID 1040>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Sbjct: 


627 


Query: 


711 


Sbjct: 


683 


Query-. 


767 


Sbj ct : 


742 


Query: 


S24 


Sbjct: 


802 




8S3 


Sbjct: 


862 



Final Results 

bacterial cytoplasm --- Certainty=0. 1613 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 659/927 (71%) , Positives = 779/927 (83%) , Gaps » 11/927 (1%) 

KLESSSNKEIITEEVGLLKQLLDEATQKLIGSESFDKIEKIVSLSLTDDYTGLKETISAL 73 
KLESS+N++1I EEV LLK++L+ T+++IG ++F IE 1+ LS DY L++ ++ + 



Query: 


14 


Sbjct: 


5 




74 


Sbjct: 


65 




134 


Sbjct: 


125 




194 


Sbjct: 






254 


Sbjct: 


245 


Query: 


314 


Sbjct: 


3 05 


Query: 


374 


Sbjct: 


359 


Query: 


434 



SN+EM ++SRYFSILPLMNISEDVDIAYEINY-KMN + DYLGKL+ TI +AG +N KD 



ILE VNWPVLTAHPTQVQRKT+LELT+ IH LLRKYRD KAG++N EKW +L RYI -\ 



MWIGGDRDGNPFVTAETL hSA VQSEVI+N+YI++L LYR SLS L ^ 



A+ SQD S+YR NEPYR+AF++IQ +LQT + L ++SS4 S 

ASLSQEQSIYRGNEPYRRAFHYIQSRLKQTQIQLT NQPAASMSSSVGUWSAWS 358 



IA+IDMRQDSS+ EACVAELLK ANIVDDYSSLSE EKC +LL++L E+PRTLSS 
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Sbjct: 414 IASIDMRQDSSVQEACVAELLKGMIITODYSSLSETEKCDVLLQQLMEEPRTLSSAAVAK 473 



Query: 494 SELLQKELAIFQTARELKDQLGEDI INQHI ISHTE3VSDMFEIAIMLKEVGLIDANQARI 553 

S+LL+KELAI+ TARELKD+LGE++I QHIISHTESVSDMFELAIMLKEVGL+D +AR+ 
Sbjct: 474 SDLLEKELAIYTTflRELKDKLGEEVIKQHIISHTESVSDMPELAIMLKEVGLVDQQRARV 533 

Query: 554 QIVPLFETIEDLDNSRDIMTQYLHYELVKKWIATNNNYQEIMLGYSDSNKDGGYLSSGWT 613 

QIVPLFETIEDLDN+RDIM YL +++VK WIATN NYQEIMLGYSDSNKDGGYL+SGWT 
Sbjct: 534 QIVPLPETIEDLDNARDIMAAYLSHDIVKSWIATNRNYQEIMLGYSDSNKDGGYLASGWT 593 

Query: 614 LYKAQNELTKIGEENGIKITFFHGRGGTVGRGGGPSYEAITSQPFGSIKDRIRLTEQGEI 673 

LYKAQNELT IGEE+G+KITFFHGRGGTVGRGGGP3Y+AITSQPFGSIKDRIRLTEQGEI 
Sbjct: 594 LYKAQNELTAIGEEHGVKITFFHGRGGTVGRGGGPSYDAirSQPFGSIKDRIRLTECJGEI 653 

Query: 674 IENKXGNQDAAYYNLEMLISASIDRMVTRMI'raPNEIDNFRSTMDGIVSESNAVYRNLVF 733 

IENKYGN+D AYY+LEMLISASI+RMVT+MIT+PNEID+FRE MD IV++SN +YR LVF 
Sbjct: 654 IENKYGNKDVAYYHLEMLISASINR.MVTQMITDPNEIDSFREIMDSIVADSNIIYRKLVF 713 

Query: 734 DNPYFYDYFFEASPIKEVSSIjNIGSRPAARKTITEISGLRAIPWVFSWSQNRIMFPGWYG 793 

DNP+FYDYFFEASPIKEVSSLNIGSRPAARKTITEI+GLRAIPWVFSWSQNRIMFPGWYG 
Sbjct: 714 DNPHFYDYFFEASPIKEVSSLNIGSRPAARKTITEITGLRAIPWVFSWSQNRIMFPGWYG 773 



Query: 794 VGSAFKHFIEQDEAI^KLQTMYQKJrtTFFNSLLSI^MVLSKSNMNIALQYAQLAGSKEV 853 

VGSAFK +I++ + NL +LQ MYQ WPFF+SLLSNVDMVLSKSNMNIA QYAQLA ++V 
Sbjct: 774 VGSAFKRYIDFAQ^NLERLQHMYQraPFFKSLLSNVDI-IVLSKSNMNIAFQYAQLAERQDV 833 

Query: 854 RDVFNIILNEWQLTKDMIIiAIEQHDNLLEENPMLHASLDYRLPYFN\7LNYVQIELIKRLR 913 

RDVF IL+EWQLTK4-+ ILAI + HD+LLE+NP L SL RLPYENVLNY+QIELIKR R 
Sbjct: 834 RDVFYEILDEWQLTKWIIAIQDHDDLLEDNPSLKHSLKSRLPYFNVLNYIQIELIKRWR 893 

Query: 914 SNQLDEDYEKLIHITINGIATGLRNSG 940 

+NQLDE+ EKLIH TINGIATGLRNSG 
Sbjct: 894 NNQLDENDEKLIHTTINGIATGLRNSG 92 0 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 322 

A DNA sequence (GBSx0352) was identified in S.agalactiae <SEQ ID 1041> which encodes the amino 
acid sequence <SEQ ID 1042>. This protein is predicted to be Bacillus licheniformis Pz-peptidase 
40 homologue (pepF). Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm — Certainty=0 .3012 (Affirmative) < succs 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1043> which encodes the amino acid 
50 sequence <SEQ ID 1044>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3137 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 512/593 (86%), Positives = 564/593 (94%) 

Query: 1 MKLKKESKFPENELWDLTMjYKDRQDFLIAIEKALEDIK^FKIO^EGKIMOTEDFTSALM 60 

M+LKKRSEFPENELWDLTALYKDRQDFLLAIEKAL+DI +FK+NYEG+L V+DFT AL+ 
Sbjct: 26 MELKTOSEFPENELWDLTALYIODRQDFIiIAIElC^QDIDIjFKRMYEGRLTSVDDFTQALI 85 

Query: 61 EIEHIYIQMSHIDTYAFMPQTTDFSNEEFAQISQAGSDFATKANVLLSFFNTALANADIK 120 

EIEHIYIQMSHI TYAFMPQTTDFS+E FAQI+QAG DF TKA+V LSFF+TALANAD+ 
Sbjct: 86 EIEHIYIQMSHIGTYAFMPQTTDFSDESFAQIAQAGDDFMTKASVALSFFDTALANADLD 145 

Query: 121 ILDSLEKNPHFKATIRQAKIQKQHLLSPEVEKALTIJU^VUTOPYDIYTKMRAGDFDMED 180 

+LD+LE NP+F A 1R AKIQK+HLLSP+VEKAL NL EV+N PYDIYTKMRAGDFDM+D 
Sbjct: 146 VLDTLEKNPYFSAAIRMAKIQKEHLLSPDVEKALANLREVIKAPYDIYTKMRAGDFDMDD 205 

Query: 181 FEVDGKTYKNSFVTYENYFQNHENAEIREKS FRS FS KG1RKHQNAAAAAYLAKVKSEKLI 240 

FEVDGKTYKNSFV+YEN++QNHENAEIREK+FRSFSKGI.RKHQN AAAAYLAKVKSEKL+ 
Sbjct: 206 FEVDGKTYKNSFVSYENFYQNHEKAEIREKAFRSFSKGLRKHQNTAAAAYLAKVKSEKLL 265 

Query: 241 ADMRGYDSVFDYLLSEQEVDRSMFDRQIDLIMDEFGPVAQRFLKHIADVNGIEKMTFADW 300 

ADM+GY SVFDYLL+EQEVDRS+FDRQIDLIM EFGPVAQ+FLKH4A VNG+EKMTFADW 
Sbjct: 266 ADMKGYASVFDYLLAEQEVDRSLFDRQIDLIMTEFGPVAQKFLKHVAQVNGLEKMTFADW 325 

Query: 301 KLDIDNELNPEVSINDAYDLVMKSVAPLGKEYSQEVERYQKERWVDFAANANKDSGGYAA 360 

KLDIDN+LNPEVSI+ AYDLVMKS+APLG+EY++E+ERYQ ERWVDFAANANKDSGGYAA 
Sbjct: 326 KLDIDNDLNPEVSIDGAYDLVMKS1APLGQEYTKEIERYQTERWVDFAANANKDSGGYAA 385 

Query: 361 DPYKVHPYVLMSWTGRMSDVYTLIHEIGHSGQFIFSDNHQSFFNTHMSTYYVEAPSTFNE 420 

DPYKVHPYVLMSWTGRMSDVYTLIHEIGHSGQFIFSDNHQS+FMTHMSTYYVEAPSTFNE 
Sbjct: 386 DPYKVHPYVLMSWTGRMSDVYTLIHEIGHSGQFIFSDKHQSYFNTHMSTYYVEAPSTFNE 445 

Query: 421 LLLSDYLENQFDTARQKRFALAHRLTDTYFHNFITHLLEAAFQRICVYTLIEEGGTFGAEQ 480 

L+LSDYLE4QFD RQKRFALAHRLTDTYFHNFITHLLEAAFQRKVYTLIEEGGTFGA+Q 
Sbjct: 446 LMLSDYLEHQFDDPRQKRFALAHRLTDTYFHKFITHLLEAAFQRKVYTLIEEGGTFGADQ 505 

Query: 481 LNAIMKEVLTQFWGDAIEIDDDARLTWMRQAHYYMGLYSYTYSAGLV1STAGYLNLKNNP 540 

LNA+MKEVLT FWGDA++IDDDAALTWMRQAHYYMGLYSYTYSAGLVISTAGYLNLK+NP 
Sbjct: 506 I^A^KEVLTDFWGDAVDIDDDAALTWMRQAHYYMGLYSYTYSAGLVISTAGYIiNLKHNP 565 

Query: 541 NGAKEWLAFLKSGGSRTPLETALLISADISTDKPLRDTINFLSNTVDQIINYS 593 

NGAKEWL FLKSGGSRTPL4TA+LI ADI+T+KPLRDTI FLS+TVDQII+Y+ 
Sbjct: 566 NGAKEWLDFLKSGGSRTPLDTAMLIGADIATEKPLRDTIQFLSDTVDQIISYT 618 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 323 

A DNA sequence (GBSx0353) was identified in S.agalactiae <SEQ ID 1045> which encodes the amino 
acid sequence <SEQ ID 1046>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>» May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1047> which encodes the amino acid 
sequence <SEQ ID 1048>. Analysis of this protein sequence reveals the following: 
Possible site: 19 
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»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 72/127 (56%) , Positives = 85/127 (66%) 

Query: 1 MIOCYIKLFLLTVFATTLVACGQPSTSNICTTTSSTLEVGKVELVVKEDTNVLSEKVVYHKG 60 

+ K K L + A LVAC Q + +TT S V LWKEDTN + EKV + KG 

Sbjct: 1 WKRFKTGFLALVAMLLVACSQGTKQIQTTPSVPIOflHHVRLWKEDTNTVDEKVSFGKG 60 

Query: 61 DTVLDVLKANYKVKEKDGFITSIDGISQDETKGLYVMFKVNNKIAPKAANQIKVKKNDKI 120 

DTVL+VLK NY+VKEKDGFIT4IDGI QD YW+FKVN K+A K A+QI VK D I 

Sbjct: 61 DTVLEVLKDNYE^nCEKDGFITAIDGIEQDTKANKYl^FKWGKMADKGADQITVKDGDSI 120 

Query: 121 EFYQEVY 127 

EFYQEV+ 
Sbjct: 121 EFYQEVF 127 

SEQ ID 1046 (GBS 185) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 28 (lane 6; MW 15.7kDa). 

GBS185-His was purified as shown in Figure 199, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 324 

A DNA sequence (GBSx0354) was identified in S.agalactiae <SEQ ID 1049> which encodes the amino 
acid sequence <SEQ ID 1050>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4 

INTEGRAL Likelihood = -4 

INTEGRAL Likelihood = -2 

INTEGRAL Likelihood = -1 

INTEGRAL Likelihood = -0 



■ 91 ( 67 - 

41 Transmembrane 33 - 49 ( 30 - 

60 Transmembrane 53 - 69 ( 52 - 

38 Transmembrane 108 - 124 ( 106 - 

06 Transmembrane 149 - 165 ( 149 - 



Final Results 

bacterial membrane Certainty=0 .2784 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 973 1> which encodes amino acid sequence <SEQ ID 9732> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10929> which encodes amino 
acid sequence <SEQ ID 10930> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1051> which encodes the amino acid 

sequence <SEQ ID 1052>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
»> Seems to have a cleavable N-term signal seq. 
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INTEGRAL Likelihood = -7.96 Transmembrane 50 - 66 ( 49 - 71) 

INTEGRAL Likelihood = -5.73 Transmembrane 101 - 117 ( 99 - 124) 

INTEGRAL Likelihood = -4.41 Transmembrane 141 - 157 ( 139 - 159) 

INTEGRAL Likelihood = -4.25 Transmembrane 73 - 89 ( 67 - 92) 

Final Results 

bacterial membrane Certainty=0 . 4185 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 82/163 (50%), Positives = 120/163 (73%), Gaps = 3/163 (1%) 

Query: 10 LTRVAILSALCTA7LRYAFAPLPNIQPITAIFLITVVLFDLKEGVATOTITMLVSSFLMGF 69 

++R+AI+SALCWLR F+ LPN+QP+TA L ++FLEV + + + +S+FL+GF 
Sbjct: 6 MSRIAIMSALCVVLRMVFSSLPNVQPOTAFLLSYLLYFGIAEAVLVMMLCLFLSAFLLGF 65 

Query: 70 GPWVFLQIISFTLILCLWKFLIYPLTKAVCFGKITEWLQTFFAGGLGWYGVIIDTCFA 129 

GPWVF Q+ F L+L LW+F++YPL++ F K ++ Q F G++YGV+IDTCFA 
Sbjct: 66 GPWVFWQVTCFVLVLLLWRFVLYPLSQQ- - FPKY-QLGCQAFLVALCGLLYGVLIDTCFA 122 

Query: 130 WLYHMPWWTYVIAGLSFNMAHALSTCLFYPILLPILRRFRNEK 172 

+LY MPWW+YVLAG+ FN+AHALST +F+P+++ + RR E+ 
Sbjct: 123 YLYSMPWWSYVLAGMPFNIAHALSTLVFFPVVMMLFRRLIGEQ 165 

A related GBS gene <SEQ ID 8549> and protein <SEQ ID 8550> were also identified. Analysis of thi 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 6.79 

GvH: Signal Score (-7.5): -0.91 
Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 

ALOM program count: 3 value: -4.46 threshold: 0.0 

INTEGRAL Likelihood = -4.46 Transmembrane 35 - 51 ( 29 - 54) 
INTEGRAL Likelihood = -1.38 Transmembrane 68 - 84 ( 66 - 84) 
INTEGRAL Likelihood = -0.06 Transmembrane 109 - 125 ( 109 - 125) 
PERIPHERAL Likelihood =7.53 88 
modified ALOM score: 1.39 

*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0. 2784 (Affirmative) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01220 (421 - 552 of 1002) 

GP|9950155|gb|AAG07353.l|AE004814_8|AE004814(16 - 56 of 69) hypothetical protein 
{Pseudomonas aeruginosa} 
%Match =3.2 

^Identity =39.5 %Similarity =60.5 

Matches = 17 Mismatches = 15 Conservative Sub.s = 9 

222 252 282 312 342 372 402 432 

STLTKLTRVAILSALCWLRYAFAPLPNIQPITAIFLII^/VLFDLKEGVATVTITMLVSSFLMGFGPWVFLQIISFTLIL 

|::: 

MDPELFEEWMMTGLVTVLI 



462 492 522 552 582 612 642 672 

CLWKFLIYPLTKAVCFGKITEWLQTFFAGGLGWYGVIIDTCFAWLYHMPWWTYVLAGLSFNMAHALSTCLFYPLLLPI 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 325 

A DNA sequence (GBSx0355) was identified in S.agalactiae <SEQ ID 1053> which encodes the amino 
acid sequence <SEQ ID 1054>. This protein is predicted to be endolysin. Analysis of this protein sequence 
reveals the following: 

Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 



Final Results 

15 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAA72266 GB:Y11477 endolysin [Bacteriophage Bastille] 

Identities = 64/210 (30%) , Positives = 95/210 (44%) , Gaps = 15/210 (7%) 

Query: 65 KPIirWSGWQLPKEIDYDTLSKNISGWIRVFGGSKISKT^AAYTTGIDKSFKTHIKEF 125 
K I+D+S +ID+DT +S + R G + + +N +D+ +KT + 

25 Sbjct: 12 KTITOISHHm--DIDFDTAKNYVSMFIARTGDGHRra--SNGELC^VVDRKYKTFVANM 67 

Query: 126 QKRNIPVAWSYALGSSVKEMKEFAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFR 185 

+ R IP Y + S V K+EA+ F+N T+DETNM + +QF 

Sbjct: 68 KARGIPFGNYMFNRFSGVASAKQEAEFFK-NYGDKDATVWVCDAEVSTAPNMKECIQVFI 126 

30 

Query: 186 KELKRLGAKNVGIYIGTYFMTEQGISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQY 245 
LK LGAK VG+YIG + EG D MP YG+ + DL Q+ 

35 Query: 246 TSQGYLPGFNQPLDLNQIAVNKDKKKTYEK 275 

T G + G + D+N + +K EK 
Sbjct: 178 TEYGNIAGIGK-CDINVLYGDKPMSFFTEK 206 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1055> which encodes the amino acid 
40 sequence <SEQ ID 1056>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-16.98 Transmembrane 8 - 24 ( 3-28) 

45 Final Results 

bacterial membrane Certainty=0 . 7793 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 198/278 (71%) , Positives = 235/278 (84%) 

Query: 1 MPJRRIKPIWAVFFSLFGLLLIIGHLHSTNTLKKELVSAKKTIPSVKASKVPQKSTSSKD 60 
MRR+IKPIW VFF L ++LIIG + + +KE+ +AK IP ++ K+++S+ 
55 Sbjct: 1 MRRKIKPIWLVFFILLAMVLIIGKRQANHAKQKEVEDAKSHIPIATSNPGKAKTSTSET 60 



Query: 61 KEFVLKPIIDVSGWQLPKEIDYDTLSKNISGWIRVFGGSKISKTNNAAYTTGIDKSFKT 120 
++F+L PI+DVSGWQLP+EIDYDTLS++ISG ++RV+GGS+I+ NNAA+TTGIDKSFKT 
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Sbjct: 61 EDFILNPIVDVSGWQLPEEIDYDTLSRHISa^IWWC-GSQITAHNNAAFTTGIDKSPKT 120 

Query: 121 HIKEFQKIWIPVAWSYALGSSVKEMCEEaQIFyKNiUiPYKPTFYWIDVEEErMSNMNKG 180 

H I KEFQKRN+ PVAVYS YALG S KEMKEEA+ FYKNAAPY PT+YWIDVEE TM +MNKG 
Sbjct: 121 HIKEFQKIWPVAVYSYALGRSTKEMKEFJUIRFYKNAAPYNPTYYWIDVEEATMKDMNKG 180 

Query: 181 VQAFRKELKRLGAKNVGIYIGTYFMTEQGISVKGFDAWIPTYGSDSGYYEAAPQTELKY 240 

V AFR+ELK+LGA+NVG+YIGTYFM EQ IS KGFD+VWI PTYGSDSGYYEAAP T L Y 
Sbjct: 181 VTAFREELKKLGAENVGLYIGTYFMAEQDISTKGFDSVWIPTYGSDSGYYEAAPNTTLDY 240 

Query: 241 DLHQYTSQGYLPGFHQPLDLNQIAVNKDKKKTYEKLFG 278 

DLHQYTSQGYL GFN LDLNQIAV KD KKT+EKLFG 
Sbjct: 241 DLHQYTSQGYLSGFNNALDLNQIAVTKDTKKTFEKLFG 278 

A related GBS gene <SEQ ID 855 1> and protein <SEQ ID 8552> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: 13.20 
GvH: Signal Score (-7.5): -0.72 

Possible site: 28 
>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 7.05 threshold: 0.0 
PERIPHERAL Likelihood = 7.05 196 
modified ALOM score: -1.91 

*** Reasoning Step: 3 

Final Results 

bacterial outside — Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Wot Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 
32.4/47.3% over 194aa 



ORF01218(496 - 1125 of 1446) 

GP| 1865711 1 etnb |CAA72266.l| |Y11477 (12 - 206 of 364) endolysin {Bacteriophage Bastille} 
%Match =7.9 

%Identity =32.3 %Similarity =47.3 

Matches = 65 Mismatches = 100 Conservative Sub.s = 30 

315 345 375 405 435 465 495 525 

VTISimRRIKPIWAVFFSLFGLLLIIGHLHSTlTrLKKELVI^KTIPSVKASKyPQKSTSSKDKEFVLKPIIDVSGWQ 

:| I |:|:| = 
MALEANKYPKEKTIVDIS- -H 



LPKEIDYDTLSKNISGWIRVFGGSKISKTNNAAYTTGIDKSFKTHIKEFQKRNIPVAVYSYALGSSVKEMKEEAQIFYK 

=11=11 =11 =1 I =1 =1= =11 : = I II I = II 1=11= 1= 

HSOffllDFDT-AKSnrTVSMFIARTGDGHRYNSN-GELCGVTO 



795 825 855 885 915 945 975 1005 
NAAPYKPTFYWIDVIlEETMSlWKGVQAFRraLKRLGAKlP/GIYIGTYFMTEQSISvICGFDAVWIPTYGSDSGYYEAAPQ 
I | = | | I || = >|| II llll 11=111 =11 I III II 
NYGDKDATVWCDAEVSTAPIMKECIQVFIDRLKELGAiacVGLYIGIIKICYQEFGGroVNCDFTWIPRYG NK 



1035 1065 1095 H25 1155 1185 1215 1245 

TELKYDLHQYTSQ^YLPGXNQPLDMQIAVNKDKKKTYEK^^ 

= || |=| I = I = l=|.= =1 HI M • 1= = = = = 
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PAFACDLWQWTEYGNIAGIGK- CDINVLYGDKPMSFFTEKKGAKETLVPALKKVVTYEVGTKLI PEIQDKLAFLGYEARI 
180 190 200 210 220 230 240 

SEQ ID 8552 (GBS206) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 6; MW 31.7kDa). 

GBS206-His was purified as shown in Figure 206, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 326 

A DNA sequence (GBSx0356) was identified in S.agalactiae <SEQ ID 1057> which encodes the amino 
acid sequence <SEQ ID 1058>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 183 - 199 ( 183 - 200) 

Final Results 

bacterial membrane certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9729> which encodes amino acid sequence <SEQ ID 9730> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG20117 GB:AE005090 NADH dehydrogenase/oxidoreductase-like 
protein; NolA [Halobacterium sp. NRC-1] 
Identities = 38/156 (24%) , Positives = 83/156 (52%) , Gaps = 13/156 (8%) 

Query: 19 TME I L IAGGSGFLGKQI I KAALTKGHKVAYLSRHEGKGDI FKDPRLTYI RGDITEADKIH 78 

+M++L+ GG+GF+G + + +GH V +R + D +T I GD+T + + 

Sbjct: 8 SMDVLVTGGTGFIGTHLCRELDDRGHDVTAFAREPADAALPAD- -VTRIVGDVTVKETVA 65 

Query: 79 LEDRTFDILIDCIGA IKPNQLD ELNVKATQKAVALCHKNQIPKLVYISA 127 

D +++ + KP4- D ++++ T+ VA + + ++ +SA 

Sbjct: 66 NAIDGHDAWNLVALSPLFKPSGGDSRHLDVHLGGTENWAAASEAGVEYILQLSALDAD 125 

Query: 128 NSGYSAYIKSKRKAEQIIKASGLDYLFVRPGLMYGE 163 
+G +AY+++K +AE+ +++S L + VRP +++G+ 

Sbjct: 126 I 



40 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8553> and protein <SEQ ID 8554> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: -7.99 
45 GvH: Signal Score (-7.5): -6.34 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -1.44 threshold: 0.0 

INTEGRAL Likelihood = -1.44 Transmembrane 183 - 199 ( 183 - 200) 
50 PERIPHERAL Likelihood =4.29 20 

modified ALOM score: 0.79 



*** Reasoning Step: 3 



WO 02/34771 



PCT/GB01/04789 



-415- 

- Final Results 

bacterial membrane Certainty=0. 1574 (Affirmative! 

bacterial outside Certainty=0. 0000 (Not Clear) • 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) • 

notif 68-70 



The protein has homology with the following sequences in the databases: 

32.5/54.4% over 274aa 

10 Schizosaccharomyces 
pombe 

GP| 33 95590 | hypothetical protein Insert characterized 

PIR|T41177 |T41177 hypothetical protein SPCC1840.09 - fission yeast Insert characterized 

15 ORF01216(358 - 990 of 1272) 

GP|3395590|emb|CAA20132.l| |AL031179(1 - 275 of 276) hypothetical protein 

{Schizosaccharomyces pombe} PIR|T41177|T41177 hypothetical protein SPCC1840.09 - fission 

yeast (Schizosaccharomyces pombe) 

%Match =7.3 
20 %Identity =32.4 %Similarity =54.3 

Matches = 71 Mismatches = 88 Conservative Sub.s = 48 

144 174 204 234 264 294 324 354 

*L**ISTDS*K*A*IPFQGIMIINIATVLFGMI^*KFTK*IMKCPDVMT*NHTVTOY*TITLTRHIKISIL^QNEGEG 

25 

384 414 444 474 504 534 564 

TMEILIAGGSGFLGKQIIKAALTKGHKVAYLSRHEGKGDIFKDPRLTYIRGDITEADKIHLEDRTFDILIDCIGAI 

hi" Illlli I I I 1 = I I - = I :|| I hi i : = =11 = « s| I 
MKI WLGGSGFLGHNI CKLAIAKGYEWSVSRRGAGGLHNKE PWMDDVEWETLDAQK - - DPNSLLPVLRDASAVVNS VG 
30 10 20 30 40 50 60 70 

585 615 648 678 

KPNQLDELNVKATQKAV ALCHKNQIPKLVYIS 

|| : : I I :|: : | = | |=| 

35 imENNYKKILQHmGPVSffljINSLSSi™^^ 

90 100 110 120 130 140 150 

699 726 753 783 810 840 846 B76 

ANS GYSA-YIKSKRKAE-QIIKASGLDYLFVRPGLMYG-EERPLSIFQAKCIKLFSHL PFLGIWQKVF 

40 |:: | llhlhll =| I I I =1 = 111 = 11 =ll = = I = I = III = = 

AHAAAPGIjDPRYIKTKREAEREISKISNLRSIFLRPGFfmiF^ 

170 180 190 200 210 220 230 

930 960 990 1020 1050 1080 1110 

45 PTK-WIVA-EAIWTLRKKPTQKILSIEEU^*FIKKATVNSSFYSFTFPKSFS*VFFLSLLTAI*FKSSG*LXPGR* 
|:= | : | III | | = I == = =1 I =1= 

PSEEVALAALEAISDPSVKGPVE-ISELKSMAHK-FKQKSL 
250 260 270 

50 SEQ ID 8554 (GBS303) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 5; MW 28.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 55 (lane 5; MW 53.2kDa). 

The GBS303-GST fusion product was purified (Figure 207, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 275), which confirmed that the protein is immunoaccessible 
55 on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 327 

A DNA sequence (GBSx0357) was identified in S.agalactiae <SEQ ID 1059> which encodes the amino 
acid sequence <SEQ ID 1060>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2850 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MSKKNKI KKTLVDQILDKAKIEH DSLQLDALQGDLPNGIQKQDIFKTLALI 51 

M+KK +KT +++++ K+ + D L +++ L GI+K IFKTL + 

Sbjct: 1 MAKKKTQQKTNAMRMVEQHKVPYKEYEFAWSEDHLSAESVAESL - -GIEKGRI FKTLVTV 58 

Query: 52 GDKTGPIIGILPLTEHLSEKKLAKISGNKKVQMIPQKDLQKITGYIHGANNPIGIRQKHN 111 

G+KTGP++ ++P + L KICLAK SGNKKV+M+ KDL+ TGYI G +P G+ K 
Sbjct: 59 GNI<TGPWAVIPGNQELDLI<KLAI<ASGNKKVEMLIILI<DLEATTGYIRGGCSPTGM--KKQ 11S 



Query: 112 YPIFIDTIALEKQELIVSAGEIGRSIRINSEVLADFVNAKFADI 155 
25 +P ++ A + +IVSAG+ G I + E 4- N +FA+I 

Sbjct: 117 FPTYLAEEAQQYSAIIVSAGKRGMQIELAPEAILSLTNGQFAEI 160 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1061> which encodes the amino acid 
sequence <SEQ ID 1062>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2651 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 114/157 (72%) , Positives = 139/157 (87%) 

Query: 1 MSKKNKIKKTLVDQILDKAKIEHDSLQLDALCGDLPNGIQKQDIFKTLALIGDKTGPIIG 60 

M+KK K+KKTLV+QILDKA I H L+L+AL+GD P+ +Q DI+KTLAL GD+TGP+IG 
Sbjct: 1 MAKKTKLKKTLVEQILDKANIAHQGLKLNALEGDFPDDLQPSDIYKTLALTGDQTGPIjIG 60 

Query: 61 ILPLTEHLSEKia^ISGNKKVQMIPQKDLQKITGYIHGANNPIGIRQKHNYPIFIDTIA 120 

I+PLTEHLSEK+LAK+SGNKKV M+PQKDLQK TGYIHGANNP+GIRQKH+YPIFID A 
Sbjct: 61 1IPLTEHLSEKQLAKVSGNKKVSDWPQKDLQKTTGYIHGANNPVGIRQKHSYPIFIDQTA 120 

Query: 121 LEKQELIVSAGEIGRSIRINSEVLADFVNAKFADIKE 157 

LEK ++IVSAGE+GRSI+I+S+ LADFV A FAD+K+ 
Sbjct: 121 LEKGQI I VSAGEVGRS IKI SSQALADFVGASFADLKK 157 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 328 

A DNA sequence (GBSx0358) was identified in S.agalactiae <SEQ ID 1063> which encodes the amino 
acid sequence <SEQ ID 1064>. Analysis of this protein sequence reveals the following: 
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N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4719 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8555> which encodes amino acid sequence <SEQ ID 8556> 
was also identified. This protein belongs to the glycolysis/gluconeogenesis pathway, and such proteins have 
been experimentally detected as surface-exposed in Streptococci. The protein has homology with the 
following sequences in the GENPEPT database: 



Query: 5 MKFYLVRHGKTQWNLEGRFQGANGDSPLLEFAIEELEELGQYLSSIHFDAVYSSDLGRAR 64 

MK YL+RHG+T WN +G +QG D PL E E+ +L L + DA+YSS L R+ 
Sbjct: 1 MKLYLIRHGETIWNEKGLWQGVT-DVPLNERGREQARKLANSLKRV--DAIYSSPLKRSL 57 

Query: 65 DTVNILNDANSCPKEIHYTPQLREWALGTLEGCKIATMQAIYPRQMTAFYQNPLQFKHDM 124 

+T + A KEI LRE + G + YP + + +P M 

Sbjct: 58 ETAEEI- -ARRFEKEIIVEEDLRECEISLWNGLTVEEAIREYPVEFKKWSSDP- --NFGM 112 

Query: 125 FGAESLYQTTHRVESFLRSLASK NYDKVLIVGHGANLTASIRSLLGYQYGSLHYKD 180 

G ES+ +RV + + S+ + V+IV H +L A I +LG LH 

Sbjct: 113 EGLESMRNVQNRWKAIMKIVSQEKLNGSENWIVSHSLSLRAFICWILGLPL-YLHRNF 171 

Query: 181 KLDNASLTIIE 191 

KLDNASL+++E 
Sbjct: 172 KLDNASLSWE 182 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1065> which encodes the amino acid 
sequence <SEQ ID 1066>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3528 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 127/205 (61%) , Positives = 152/205 (73%) 

Query: 5 MKFYLVRHGKTQWNLEGRFQGANGDSPLLEEAIEELEELGQYLSSIHFDAVYSSDLGRAR 64 

MK Y VRHGKT WNLEGRFQGA GDSPLLEEA +E+ LG+ L+ + FDAVY+SDL RA 
Sbjct: 1 MKLYFvRHGKTLWNLEGRFQGAGGDS PLLEEAKDE I HLLGKELAKVAFDA VYTSDLQRAM 60 

Query: 65 DTVNILNDANSCPKEIHYTPQLREWALGTLEGCKIATMQAIYPRQMTAFYQNPLQFKHDM 124 

T 1+ DA ++++T QLREW LG LEG KIATM AIYP+QM AF +N QFK D 

Sbjct: 61 ATAAIILDAFDQQPKLYHTDQLREWRLGKLEGAKIATMAAIYPQQMLAFRENLAQFKPDQ 120 

Query: 125 FGAESLYQTTHRVESFLRSLASKNYDICVLIVC-HGANLTASIRSLLGYQYGSLHYKDKLDN 184 

F AES+YQTT RV ++S K+Y VLIVGHGANLTA+IRSLLG++ L K LDN 
Sbjct: 121 FEAESIYQTTQRVCHLIQSFKDKHYQtlVLIVGHGANLTATIRSLLGFEPALLIjAKGGLDN 180 

Query: 185 ASLTI IETHDFKDFNCLTWNDKSYL 209 

ASLTI+ET D+ ++CL WNDKS+L 
Sbjct: 181 ASLTILETKDYLTYDCLIWNDKSFL 205 
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SEQ ID 8556 (GBS314) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 4; MW 27.2kDa), in Figure 169 (lane 15-17; MW 41.6kDa) and in 
Figure 239 (lane 4; MW 41.6kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
analysis of total cell extract is shown in Figure 55 (lane 4; MW 52.1kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 329 

A DNA sequence (GBSx0359) was identified in S.agalactiae <SEQ ID 1067> which encodes the amino 
acid sequence <SEQ ID 1068>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3014 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12562 GB:Z99108 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 69/232 (29%) , Positives = 108/232 (45%) , Gaps = 9/232 (3%) 

Query: 4 SIVFDVDDTIYDQQAPYRIAVEKCFPDFDMSAINQAYIRFRHYSDIGPPRVMAGEWTTEY 63 

+++FDVDDTI D QA +A+ F D ++ N +++ + + G+ T + 

Sbjct: 6 TLLFDVDDTILDFQAAEAI^RLI,FEDQNIPLTNDMKAQYKTINQGLWRAFEEGKMTRDE 65 

Query: 64 FRFWCKEmLEFGYREIDEATGIYFQEIYEHELENITMIiDEMRMTLDFLKSKNVPMGII 123 

R L E+GY EAG++YLE L+ L ++I + 

Sbjct: 66 WNTRFSALLKEYGY EADGALLEQKYRRFLEEGHQLIDGAFDLISNLQQQFDLYIV 121 

Query: 124 TNGPTEHQLKKVKKLGLYDYVDPKRVIVSQATGFQKPEKEIFNLAAEQF-DMNPSTTLYV 182 

TNG + Q K+++ GL+ + K + VS+ TGFQKP KE FN E+ + TL + 
Sbjct: 122 TNGVSHTQYKRLRDSGLFPFF--KDIFVSEDTGFQKPMKEYFNYVFERIPQFSAEHTLII 179 

Query: 183 GDSYDNDIMGAFNGGWHSMWFNHRGRSLKPGIKPVYDVAIDNFEQLFGAVKV 234 

GDS DIG G+WN + PIPY+I E+L+ + + 
Sbjct: 180 GDSLTADIKGGQIAGIDTCWMNPDMKPNVPEIIPTYE--IRKLEELYHILNI 229 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1069> which encodes the amino acid 
sequence <SEQ ID 1070>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3216 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 276/300 (92%) , Positives = 292/300 (97%) 

Query: 1 MITSIVFDVDDTIYDQQAPYRIAVEKCFPDFDMSAINQAYIRFRHYSDIC-FPRVMAGEWT 60 

MIT+ IVFDVDDTIYDQQAPYRIA+EKCFPDFDMS +NQAYIRFRHYSD+GFPRVMAGEWT 
Sbjct: 1 MITAIVFDVDDTIYDQQAPYRIAMEKCFPDFDI^SVMNQAYIRFRHYSDVGFPRVMAGEWT 60 



55 Query: 61 TEYFRFWRCKETLLEFGYREIDEATGIYFQEIYEHELENITMLDEMRMTLDFLKSKNVPM 120 

TEYFRFWRCKETLLEFGYREIDEA G+ + FQE+YEHELENITMLDEMRMTLDFLKSKNVPM 
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Sbjct: 61 TEYFRFWRCKETLLEFGYREIDEAAGVHFQSVYEHELENITMLDEMRMTLDFLiCSKNVPM 120 

Query: 121 GIITNGPTEHQLKKVKKIiGLYDYVDPKRVIVSQATGFQKPEKEIFNLftAEQFDMNPSTTL 180 

GIITNGPTEHQLKKV+KLGLYDY+D KRVIVSQATGFQKPEKEIFNLAAEQFDMNP TTL 
Sbjct: 121 GIITNGPTEHQLKKVRKLGLYDYIDAKRVIVSQATGFQKPEKEIFHLAAEQFDMNPQTTL 180 

Query: 181 YVGDSYDNDIMGAFNGGWHSMWFKHRGRSLKPGIKPVYDVAIDNFEQLFGAVIWLFDLPD 240 

WGDSYDNDIMGAFNGGWHSMWFNHRGR LKPG KPVYDVAIDNFEQLFGAVKVLFDLPD 
Sbjct: 181 WGDSYDNDIMGAFKGGWHSMWFNHRGRQLKPGTKPVYDVAIDNFEQLFGAVKVLFDLPD 24 0 

Query: 241 NKFIFDINDKSNPVLEMGIOTGLMMAAERLIjESNMSVDICWILLRLTAKQEIOTLRMKYAR 300 

NKFIFD+NDK NP4L+MG+NNGLMMAAERLLESNMS+DKWILLRLT +QEKVLR4-KYAR 
Sbjct: 241 NKFIFDVNDKKNPILQMGINNGLMMAAERLLESNKSIDKWILLRLTKQQEKVLRLKYAR 300 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 330 

A DNA sequence (GBSx0360) was identified in S.agalactiae <SEQ ID 1071> which encodes the amino 
acid sequence <SEQ ID 1072>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 24 51 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9727> which encodes amino acid sequence <SEQ ID 9728> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB11858 GB:Z99104 lysyl-tRNA synthetase [Bacillus subtilis] 
Identities = 318/490 (64%), Positives = 390/490 (78%), Gaps = 1/490 (0%) 

Query: 44 EEIjNDQQIVRREKMAALTEQGIDPFGKRFERTATSGQLNEICfADKSKEDLHDIEETATIA 103 
35 EELNDQ VRR+KM L + GIDPFG RFERT S ++ YD +KE+L + TIA 

Sbjct: 9 EELNDQLQVRRDKMNQLRDNGIDPFGARFERTHQSQEVISAYQDLTKEELEEKAIEVTIA 68 

Query: 104 GRLMTKRGKGKVGFAHIQDREGQIQIYVRKDSVGEENYEIFKKADLGDFLGVEGQVMRTD 163 
GR+MTKRGKGK GFAH+QD EGQIQIYVRKDSVG++ YEIFK +DLGD +GV G+V +T+ 
40 Sbjct: 69 GRMMTKRGKGKAGFAHLQDLEGQIQIYVRKDSVGDDQYEIFKSSDLGDLIGVTGKVFKTN 128 

Query: 164 MGELSIKATHITHLSKALRPLPEKFHGLTDIETIYRKRHLDLISNRDSFDRFVTRSKIIS 223 

+GELS+KAT L+KALRPLP+K+HGL D+E YR+R+LDLI N DS F+TRSKII 
Sbjct: 129 VGELSVKATSFELLTKALRPLPDKYHGLKDVEQRYRQRYLDLIVNPDSKHTFITRSKIIQ 188 

45 

Query: 224 EIRRFMDSNGFLEVETPVLHNEAGGASARPFITOHN7AQDIDMVLRIATELHLKRLIVGGM 283 

+RR++D +G+LEVETP +H+ GGASARPFITHHNA DI + +RIA ELHLKRLIVGG+ 
Sbjct: 189 AMRRYLDDHGYLEVETPTMHSIPGGASARPFITHHNALDIPLYMRIAIELHLKRLIVGGL 248 

50 Query: 284 ERVYEIGRIFRNEGMDATHNPEFTSIEAYQAYADYQDIMDLTEGIIQHVTKTVKGDGPIN 343 

E+VYEIGR+FRNEG+ HNPEFT IE Y+AYADY+DIM LTE ++ H+ + V G I 
Sbjct: 249 EKVYEIGRVFRNEGVSTRHNPEFTMIELYEAYADYKDIMSLTENLVAHIAQEVLGTTTIQ 3 0B 

Query: 344 YQGTEIKIlffiPFKRVHMvDAWEITGIDFWKE^LEEAQAIAQEKNVPLEKHFTTVGHII 403 
55 Y +1 + +KR+HMVDAVKE TG+DFW+E+T+E+A+ A+E V + K TVGHII 

Sbjct: 309 YGEEQIDLKPEWKRIHMVDAVKEaTGVDF^EVWEQAREYAECEHEVEI-KDSMTVGHII 367 

Query: 404 NAFFEEFVEDTLIQPTWFGHPvWSPLAKKNDTDPRFTDRFELFIMTKEYANAFTELND 463 
N FFE+ +E+TLIQPTF++GHPVE+SPLAKKN DPRFTDRFELFI+ +E+ANAFTELND 
60 Sbjct: 368 NEFFEQKIEETLIQPTFIYGHPVEISPLAKKNPEDPRFTDRFELFIVGREHANAFTEliND 427 
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Query: 464 PIDQLSRFEAQASAKELGDDEATGVDYDYV^EALEYGMPPTGGLGIGIDRLCMLLTDTTTI 523 

PIDQ RFEAQ +E G+DEA +D D+VEALEYGMPPTGGLGIGIDRL MLLT+ +1 
Sbjct: 428 PIDQRERFEAQLKEREAGNDEAHLMDEDnTSALEYGMPPTC-GLGIGIDRIiVMLLraAPSI 487 

Query: 524 RDVLLFPTMK 533 

RDVLLFP M+ 
Sbjct: 488 RDVLLFPQMR 497 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1073> which encodes the amino acid 
sequence <SEQ ID 1074>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=Q .4694 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 
Identities = 439/500 (87%) , Positives = 474/500 (94%) 



LEE MSNQHIEELNDQQIVRREKM AL EQGIDPFGKRF+RTA S +L EKYADK+KE+L 
Sbjct: 1 LEElMSNQHIEEIiNDQQIVRREKMTALAEQGIDPFGKRFDRTANSAELKEKYADKTKEEL 6 



Query: 154 GVEGQVMRTDMGELSIKATHITHLSKALRP^PEKFHGLTDIETIYRKRHLDLISIslRDSFD 213 

G VEG+ VMRTDMGELS I KAT +THLSK+LRPLPEKFHGLTDIETIYRKRHLDLISNR+S FD 
Sbjct: 121 GVEGEVMRTDMGELSIKATKLTHLSKSLRPLPEKFHGLTDIETIYRKRHLDLISKRESFD 180 

Query: 214 RFVTRSKIISEIRRFMDSNGFLEVETPVLHNEAGGASARPFITHHNAQDIDMVLRIATEL 273 

RF\7TRSK+ISEIRR++D FLEVETPVLHNEAGGA+ARPF+THHNAQ+IDMVLRIATEL 
Sbjct: 181 RFVTRSKMISEIRRYLDGLDFLEWPVLHNEAGGAAARPFVTHHNAQNIDMVLRIATEL 240 

Query: 274 HLKRLIVGGMERVYEIGRIFRNEGMDATHNPEFTSIEAYQAYADYQDIMDLTEGIIQHVT 333 

HLKRLIVGGMERVYEIGR1FRNEGMDATHNPEFTSIE YQAYADY DIM+LTEGIIQH 
Sbjct: 241 HLKRLIVGGMERVYEIGRIFRNEGMDATHNPEFTSIEVYQAYADYLDIMNLTEGIIQHAA 300 

Query: 334 KWKGDGPIOTQGTEIKINEPFKRVHICnTOAVKEITGIDFWKEMTLEEAQALAQEKNVPLE 393 

K V+GDGPI+YQGTEI+INEPFICRVHMVDA+KE+TG DFW EMT+EEA ALA+EK VPLE 
Sbjct: 301 KAWGDGPIDYCGTEIRINEPFKRVHMVDAIKEVTGADFWPEMrArEEAIALAKEKQVPLE 360 

Query: 394 KHFTTVGHIINAFFEEFVEDTLIQPTFVFGHPVEVSPIiAKKNDTDPRFTDRFELFIMTKE 453 

KHF +VGHI INAFFEEFVE+TL+QPTFVFGHPVEVS PLAKKN D RFTDRFELFIMTKE 
Sbjct: 361 KHFISVGHIINAFFEEFVEETLVQPTFVFGHPVEVSPLAKKNPED1RFTDRFELFIMTKE 420 

Query: 454 YANAFTELNDPIDQLSRFEAQASAKELGDDEATGVDYDYVEALEYGMPPTGGLGIGIDRL 513 

YANAFTELNDP1DQLSRFEAQA AKELGDDEATG+DYD+VEALEYGMPPTGGLGIGIDRL 
Sbjct: 421 YANAFTELNDPIDQLSRFEAQAQAKELC-DDEATGIDYDFVEALEYGMPPTGGLGIGIDRL 480 

Query: 514 CMLLTDTTTIRDVLLFPTMK 533 

CMLLT+TTTIRDVLLFPTMK 
Sbjct: 481 CMLLTNTTTIRDVLLFPTMK 500 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 331 

A DNA sequence (GBSx0361) was identified in S.agalactiae <SEQ ID 1075> which encodes the amino 
acid sequence <SEQ ID 1076>. This protein is predicted to be 6,7-dimethyl-8-ribityllumazine synthase 
(ribH). Analysis of this protein sequence reveals the following: 

5 Possible site: 34 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1042 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14257 GB:Z99116 riboflavin synthase (beta subunit) [Bacillus subtilis] 
15 Identities = 103/151 (68%) , Positives = 120/151 (79%) 

Query: 1 MTIIEGQLVANEMKIGIWSRFNELITSKLLSGAVDGLLRHGVSEEDIDIVWVPGAFEIP SO 

M II+G LV +KIGIW RFN+ ITSKLLSGA D LLRHGV DID+ WVPGAFEIP 
Sbjct: 1 MNIIQGNLVGTGLKIGIWGRFNDFITSKLLSGAEnALLRHGVDTNDIDVAWVPGAFEIP 6 0 

20 

Query: 61 YMARKMALYKDyDAIICLGWIKGSTDHYDYVC^ffiVTKGIGHLNSQSDIPHIFGVIjTTDN 120 

+ A+KMA K TOAII LG VI+G+T HYDYVCNE KGI + + +P IFG++TT+N 
Sbjct: 61 FAAKKMAETKKYDAIITLGWIRGATTHVDWCNEAAKGIAQAANTTGVPVIFGIVTTEN 120 

25 Query: 121 IEQAIERAGTKAGNKGVDCALSAIEMVNLDK 151 

IEQAIERRGTKAGNKG DCA+SAIEM NL++ 
Sbjct: 121 IEQAIERAGTKAGNKGVDCAVSAIEMANLNR 151 

No corresponding DNA sequence was identified in S. pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 332 

A DNA sequence (GBSx0362) was identified in S.agalactiae <SEQ ID 1077> which encodes the amino 
acid sequence <SEQ ID 1078>. This protein is predicted to be GTP cyclohydrolase ii (ribA/B). Analysis of 
35 this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-tertninal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty= 0.1 9 18 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=D . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9725> which encodes amino acid sequence <SEQ ID 9726> 
45 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA86524 GB:U27202 GTP cyclohydrase 11/ 

3 , 4-dihydroxy-2 -butanone-4 -phosphate synthase 
[Actinobacillus pleuropneumoniae] 
50 Identities = 230/395 (58%), Positives = 307/395 (77%) 



Query: 19 FSPIKIOjLQDIKSGKMVvLMDDENRENEGDLICAAEMVTKESINFMAKFGKGLICLPLSN 78 

FS ++ ++ 1+ GK++++ DDE+RENEGD ICAAE T E+INFMA +GKGLIC P+S 
Sbjct: 6 FSKVEDMEAIRQGKIILVTDDEDRENEGDFICAAEFATPENINFMATYGKGLICTPIST 65 
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Query: 79 YYTffiKLELAQMASHMDITOETAFTISIDHLSTSTGISAEDRALTAKMVANDSSKaKDFRR 138 

A+KL M + N DNHETAFT+S+DH+ T TGISA +R++TA + +D++KA DFRR 
Sbjct: 66 EIAKKLNFHPMVAWQDNHETAFTVSVDHIDTGTGISAFERSITAMKIVDDNAKATDFRR 125 

Query: 13 9 PGHLFPLIAKEGGVIARWGHTFATVDLCRLAGLKECGLCCE1MAEDGSMMRKDELIAFAQ 198 

PGH+FPL+AKEGGVL RNGHTEATVDL RLAGLK GLCCE IMA+DG+MM +L FA 
Sbjct: 126 PGHMFPLIAKEGGVLVRNGHTEATVDLARLAGBKHAGLCCEIMADDGTMMTMPDLQKFAV 185 

Query: 199 KHDLAIATIKQLQDYRRQEEGGVWEIEIQLPTQFGHFTAYGYSEWMIKEHVALVKGDI 258 

+H++ TI+QLQ+YRR+ + V + +++PT++G F A+ + EV++ KEHVALVKGD+ 
Sbjct: 186 EHNMPFITIQQLQEYRRKHDSLVKQISWKMPTKYGEFMAHSFVEVISGKEHVALVKGDL 245 



Query: 319 KLKAYHLQEEGLDTLEANLALGFEGDERDYGVSAQLLKDLGINSIMLLTNNPDKIQQLEA 378 

KL+AY LQ++G+DT+EAN+ALGF+ DER+Y + AQ+ + LG+ SI LLTNNP KI+ L+ 
Sbjct: 306 KLRAYELQDKGMDTVEANVALGFKEDEREYYIGAQMFQQLGVKSIRLLTNNPAKIEGLKE 365 

Query: 379 EGICVKNRVPLQVAVTAYDLNYLKTKKEKMGHLLD 413 

+G+ + R P+ V D++YLK K+ KMGH+ + 

Sbjct: 366 QGLNIVAREPI IVEPNKNDIDYLKVKQIKMGHMFN 400 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 333 

A DNA sequence (GBSx0363) was identified in S.agalactiae <SEQ ID 1079> which encodes the amino 
acid sequence <SEQ ID 1080>. This protein is predicted to be riboflavin synthase alpha chain (ribE). 
Analysis of this protein sequence reveals the following: 

o U- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3517 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Wot Clear) < succ> 

A related GBS nucleic acid sequence <SEQ ID 9723> which encodes amino acid sequence <SEQ ID 9724> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05274 GB:AP001512 riboflavin synthase alpha subunit [Bacillus halodurans] 
Identities = 98/216 (45%), Positives = 147/216 (67%), Gaps = 2/216 (0%) 

Query: 1 MFTGIIEEMGQVSRIRNGIKSQQLSIDAPKLVPLLRKGDSVAVNGVCLTVLDKSETAFIA 60 

MFTGIIE++G + 1+ ++ ++I + K+V ++ GDS+AVNGVCLTV ++T F 
Sbjct: 1 MFTGIIEDVGTIDAIQQTGEAIVMTITSKKIVSDVQLGDSIAVNGVCLTVTSFTDTQFTV 60 

Query: 61 DVMPES^TOTSLAALRLHSK^mLEIJU 1 RSDSRLGGHFVLGHVDGVGKIEKIQKDDIAVRF 120 

D+MPE++ TSL L S+VNLE A+ ++ R GGH V GHVDG+G I K ++ D AV + 
Sbjct: 61 DLMPETVRATSLRLLSKGSRVNLERAMVANGRFGGHIVSGHVDGIGTIRKKERKDNAVYY 120 

Query: 121 SID&PPSIMSYIIEK3SVRLDGISLTWSFTEHSFEVSVIPHTM&QTNLSLKKVGDLLNI 180 

+1+ S+ Y+I KGSVA+DG SLT+ ++ +F +S+IPHTM +T + LKK GD++NI 
Sbjct: 121 TIEVSSSLRRYMIHKGSVAVDGTSLTIFDVSDKTFTISIIPHTMEETIIGLKKAGDIVNI 180 
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Query: 181 EVDVLGKYAEKFLAPTNRTNHTSSVMDWSFIBENGy 216 

E D++GKY E+F+ N + +FL+E+GY 

Sbjct: 181 ECDLIGKYIEQFVQQGKPVNEGG- -LTKAFLTEHGY 214 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 334 

A DNA sequence (GBSx0364) was identified in S.agalactiae <SEQ ID 1081> which encodes the amino 
acid sequence <SEQ ID 1082>. This protein is predicted to be riboflavin-specific deaminase (ribD). 
Analysis of this protein sequence reveals the following: 
Possible site: 30 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.01 Transmembrane 307 - 323 ( 307 - 323) 

Final Results 

bacterial membrane --- Certainty=0 . 1404 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA86522 GB:U27202 riboflavin-specific deaminase [Actinobacillus 
pleuropneumoniae] 
Identities = 182/353 (51%), Positives = 259/353 (72%) 



G+T YVTLEPCCH G+ PPC++ LI+ GIXKV +GS DPNPLV+G+G LR+ G+ V 



Query: 


6 


Sbjct: 


51 


Query: 


66 


Sbjct: 


111' 


Query: 


126 


Sbjct: 


171 




186 


Sbjct: 


231 


Query: 


246 




291 


Query: 


306 


Sb j ct : 





G+L+EECDALN F ++■ K+P+V +KYAMT DGKIAT +G+SKWI+ E +R VQ+ I 



h SAIMVG++TVLADNP L R+P + VRIVCDSQL+ PLD 



V +G ++ ++ ++DL+ L+ LG+ IDSLL+EGGSSL+FSAL++ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1083> which encodes the amino acid 
sequence <SEQ ID 1084>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

erminal signal sequence 

- 104 ( 88 - 105) 



Final Results 

bacterial membrane Certainty=0 . 1468 (Affirmative) < suco 
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bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 0000 (Not Clear) ■ 
■ Certainty=0 . 0000 (Not Clear) . 



The protein has homology with the following sequences in the databases: 



Query: 13 LEEQTYFMQEALKEAEKSLQKAEIPIGCVIVKDGEIIGRGHNAREESNQAIMHAEMMAIN 72 

+ + +M+EA+KEA+K+ +K E+PIG V+V +GEII R HN RE ++I HAEM+ 1 + 
Sbjct: 1 MTQDELYMKEAIKEAKKAEEKGEVPIGAVLVIN3E 1 I ARAHNLRETEQRS IAHAEMLVID 60 

Query: 73 EANAHEGNWRLLDTTLFVTIEPCVMCSGAIGLARIPHVIYGASNQKFGGVDSLYQILTDE 132 

EA G WRL TL+VT+EPC MC+GA+ L+R+ V++GA + K G +L +L +E 
Sbjct: 61 EACKALGTWRLEGATLYVTLEPCPMC^GAWLSRVEKVVFGAFDPKGGCSGTLMNLLQEE 120 

Query: 133 RLIOTRVQVERGLLAADCANIMQTFFRQGRERKXIAKHLIKE 173 
R NH+ +V G+L +C ++ FFR+ R++KK A+ + E 

Sbj ct : 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 48/146 (32%) , Positives = 71/146 (47%) , Gaps 
Query: 7 
Sbjct: 19 
Query: 63 



YMALALKEAEKGMGFVAPNPLVGAVIVKDDRIISKGYHKRFGD LHAERQAIKNADE 62 

+M ALKEAEK + A P +G VIVKD II +G++ R +HAE AI A+ 

IGCVIVKDGEIIGRGHNAREESNQAIMHAEMMAINEANA 76 



+ +TL+VT+EPC 



Sbjct 



77 



Query: 118 KEGLN VEVGILREECDALNERF 139 

E LN VE G+L +C + + F 
Sbjct: 131 DERLNHRVQVERGLLAADCANIMQTF 156 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 335 

A DNA sequence (GBSx0365) was identified in S.agalactiae <SEQ ID 1085> which encodes the amino 
acid sequence <SEQ ID 1086>. This protein is predicted to be Nramp metal ion transporter. Analysis of this 
protein sequence reveals the following: 



Possible site: 43 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.89 Transmembrane 169 - 185 ( 

INTEGRAL Likelihood =-11.09 Transmembrane 140 - 156 ( 

INTEGRAL Likelihood = -6.85 Transmembrane 359 - 375 ( 

INTEGRAL Likelihood = -6.48 Transmembrane 269 - 285 ( 

INTEGRAL Likelihood = -6.16 Transmembrane 426 - 442 ( 

INTEGRAL Likelihood = -5.57 Transmembrane 62 - 78 ( 

INTEGRAL Likelihood = -4.94 Transmembrane 107 - 123 ( 

INTEGRAL Likelihood = -4.46 Transmembrane 391 - 407 { 

INTEGRAL Likelihood = -4.35 Transmembrane 310 - 326 ( 



128 - 165) 
354 - 379) 



263 - 287; 



— Certainty=0. 5755 (Affirmative) < succ 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF83825 GB:AE003939 manganese transport protein [Xylella 



WO 02/34771 



PCT/GB01/04789 



, Positives = 274/436 (62%) , Gaps = 14/436 (3%) 

Query: 10 SLSEVNQSVEVPHNSSFWWTLRAFLGPGALVAVGYMDPGNMITSVIGGATyRYLLLFVVL 69 
5 SL E++ SV V + L AFLGPG +V+VGYMDPGNW T + GG+ + Y+LL V+L 

Sbjct: 3 9 SLGEMHASVAVSRRGHWGFRLLAFLGPGYWSVGYMDPGNWATGLAGGSRFGYMLLSVIL 98 

Query: 70 VSSLMAMQLQQMAGKLGIVTRQDLAQATASRLPKPLRYLljFIIIELALIATDLAEVIGSA 129 
+S++MA+ LQ +A +LGI + DIAQA +R + L+++ ELA+IA DLAEVIG+A 

10 Sbjct: 99 LSNVmiVLQALAARLGIASDMDIiAQACRARYSRGTTLALVm/CEIAIIACDLAEVIGTA 158 

Query: 130 IALHLLFGWPLLLSIMITILDVFLLLLLMKLGVQKIEAFVSVLILTILIIFTYLWLSQP 189 

IAL+LL G £>++ ++IT +DV L+LLLM G + +FAFV L+L I F +VL+ P 
Sbjct: 159 IALNLLLGVPIIWGWITAVDWLVLLLMHRGFRALFAFVIALLLVIFGCFWQIVLAAP 218 

15 

Query: 190 DLDAMFKGFLPHHELFNISHEGKNSPLTLALGIIGATVMPHNLYLHSSLSQTRRVDYHNK 249 

L + GF+P ++ L LA+GI+GATVMPHNLYLHSS+ QTR 

Sbjct: 219 PLQEVLGGFVPRWQW ADPOALYLAIGI VGATVMPHNLYLHSS I VQTRAYP - RTP 272 

20 Query: 250 SSIKKAWFMTLDSNIQLSIAFVVNSLLLVLGASLFYG-HANDISAFSQMYLALSDKTIT 308 

+ A+R+ DS + L LA +N+ +L+L A++F+ H D+ Q Y h+ 
Sbjct: 273 VGRRSALRWAVADSTIAMIALFINASILIIAAAVFHAQHHFDVEEIEQAYQLLAPVLGV 332 

Query: 309 GAVASSFLSTLFAVALLASGQNSTITGTLTGQrVMEGFLHFKLPQWLIRLCTRLLTLLPI 368 
25 G A TLFA ALLASG NST+T TL GQIVMEGFL +L WL R+ TR L ++P+ 

Sbjct: 333 GVAA. TLFATALl^SGINSTVTATLAGQIVIffiGFLRLRLRPWLRRVLTRGLAIVPV 387 

Query: 369 FVIALLVGGEENTLDQLIVYSQVFLSLALPFSIFPLIYFTSQKSIMGEHANAKIMTYLAY 428 
V+ L G E +L++ SQV LS+ LPF++ PL+ + + +MG +W +A+ 

30 Sbjct: 388 IVWALYG--EKTGRLLLLSQVILSMQLPFAVIPIiRCVADRKVMGALVAPRWLMVVAW 445 

Query: 429 LVAIILTLLNLKLIMD 444 

L+A ++ +LN+KL+ D 
Sbjct: 446 LIAGVIWLNVKLLGD 461 

35 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 336 

40 A DNA sequence (GBSx0366) was identified in S.agalactiae <SEQ ID 1087> which encodes the amino 
acid sequence <SEQ ID 1088>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-14.12 Transmembrane 113 - 129 ( 9B - 

45 INTEGRAL Likelihood =-12.15 Transmembrane 228 - 244 ( 220 - 

INTEGRAL Likelihood =-10.83 Transmembrane 175 - 191 ( 167 - 

INTEGRAL Likelihood = -5.04 Transmembrane 57 - 73 ( 55 - 

INTEGRAL Likelihood = -3.93 Transmembrane 146 - 162 ( 142 - 

INTEGRAL Likelihood = -1.38 Transmembrane 199 - 215 ( 199 - 215 

Likelihood = -0.32 Transmembrane 82 - 98 ( 82 - 98 



60 



Final Results 

bacterial membrane Certainty=0. 6647 (Affirmative) < succ: 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF11325 GB:AE002018 hypothetical protein [Deinococcus radiodurans] 
Identities = 63/215 (29%), Positives = 108/215 (49%), Gaps = 13/215 (6%) 

Query: 11 LLLVFILTIIVNYLSATGFLTGNSQKSLSDRYQTLLTPAPLAFSIWSVIYL-LTFLVILR 69 
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LL +LT++VNYLS L GNS +SDR TPA L F++W I+L L + + 

Sbjct: 10 LLRATVLTLVVOTLSNKLPLFGNSNMVSDRLPNAFTPAGLTFIWGPIFLGLLVPAVYQ 69 

Query: 70 AIFSKSQSYQDNFASIFPYFLGLLLVlWIWTVFFTSNLIGLSTIIIFAyCILLV-IIIKI 128 
5 A+ ++ + D +P+ LG I! N W + F S IGLS +1+ A +LV + + + 

Sbjct: 70 ALPAQRGARLDRL- -FWPFLLGNLL-NVAWLIjAFQSLNIGLSWIMLALLAVLVRLYLSV 126 

Query: 129 LS---KNKSKLLLRITFGIHAGWLLVASLVNIAVYLVKI DFNYPLPKVYIAI IALI 181 

S + + L++ ++ W+ VA++ W+ +LV F V+ A++ ++ 

10 Sbjct: 127 RSLPPQGAERWTLQIiPVSDYIiAWISVATIANITAFLVSAGVTQSFLGIAGPVWSALLIiW 186 

Query: 182 FITVLSLYIARVLQNAYLILSVFWAWLMVFKAHLE 216 

+ +L R A+ + + WA+ V+ A E 
Sbjct: 187 AAAIGVFFLWRFRDYAFAAV-LLWAFYGVYVARPE 220 

15 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 337 

20 A DNA sequence (GBSx0367) was identified in S.agalactiae <SEQ ID 1089> which encodes the amino 
acid sequence <SEQ ID 1090>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3401 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC6S352 GB:AE001215 T. pallidum predicted coding region 
TP0352 [Treponema pallidum] 
Identities = 28/64 (43%) , Positives = 41/64 (63%) 

35 Query: 3 EFTFEIVEKLLVLSF^KGWTKSIJ^VSFNGAPAKFDLRTWSPDHTKMGKGITLSNEEFK 62 

+F +E+ LS + GW+ EL +S+NG P K+D+R WSPD +KMGKG+TL+ E 

Sbjct: 12 DFHYEVTPUWGTLSTSGNGWSLELKSISWNGRPEKYDIRAWSPDKSKMGKGVTLTRAEIV 71 

Query: 63 VILD 66 
40 + D 

Sbjct: 72 ALRD 75 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1091> which encodes the amino acid 
sequence <SEQ ID 1092>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 4021 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 59/70 (84%) , Positives = 64/70 (91%) 



Query : 1 

M+EFTF I E LL LSEN-HKGWTKELNRVSFNGA AK+D+RTWSPDHTKMGKGITL+NEE 
Sbjct: 1 MAEFTFNIEEHLLTLSElOCGWTKEIJmVSFNGAE 60 
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Query: 61 FKVILDAPRK 70 

FK ILDAFRK 
Sbjct: 61 FKTILDAFRK 70 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 338 

A DNA sequence (GBSx0368) was identified in S.agalactiae <SEQ ID 1093> which encodes the amino 
10 acid sequence <SEQ ID 1094>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 92 - 108 ( 92 - 110) 

15 Final Results 

bacterial membrane Certainty=0 .2062 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=D. 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 





4 


Sbjct: 


18 




64 


Sbjct: 


78 


Query: 


124 


Sbjct: 


138 




184 


Sbjct: 


198 




244 


Sbjct: 


257 


Query: 


304 


Sbjct: 


316 




364 


Sbjct: 


376 



• K+PE+L+PAG LEKLK+A+ YGADAVF+GGQ YGLRS A NF++EE+ EG+ +A 



N+HNG ++LD + +IV+DP +1 C AP +E+HLSTQ 



S + N++ +FWKE GL RWLARE + E+ E+4-++ D4EIE+F+HGAMCI+YSGRCVLS 



E G+DSLKIEGRMKSIHYV+TV + Y+ 4-DAY PE F I+++ ++EL K A R+ A 



T F4 TP EQ+FG K Y FVG V++4D 



h D +GN++D A +P++++ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1095> which encodes the amino acid 
sequence <SEQ ID 1096>. Analysis of this protein sequence reveals the following: 



Possible site: 61 
•> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -2.66 Transmembrane 92 - 108 ( 92 

Final Results 

bacterial membrane — Certainty=0 . 2062 (Affirmative; 
bacterial outside Certainty=0 . 0000 (Not Clear) • 



WO 02/34771 



PCT/GB01/04789 



-428- 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB04993 GB:AP001511 protease [Bacillus halodurans] 
Identities = 201/403 (49%) , Positives = 280/403 (68%) , Gaps = 4/403 (0%) 



VYVAANMVTHEGNEIGAGEWFRQLRDMGLDAVIVSDPALIVICSTEAPGLEIHLSTQASS 12 5 
VYV N+ H N G E+ L+++G+ +IV+DP +1 C AP +E+HLSTQ S 
VYVTTNIYAHNENMDGLEEYLSALQEVGVTGIIVADPLIIETCKRVAPKVEVHLSTQQSL 13S 



+FWK GL RWLAREV -f E+ E++K D+EIE FVHGAMCI S YSGRCVLSNH 



Query: 


6 


Sbjct: 


17 


Query: 
Sbjct: 


66 
77 


Query: 


126 


Sbjct: 


137 




185 


Sbjct: 


197 




245 


Sbjct: 


255 




305 


Sbjct: 


314 




365 


Sbjct: 


374 



M+ RD+NRGGC QSCRW YDLY+ E +G++P Y+MS D+ +1 IP LIE 



G+DSLK+EGRMKSIHYV+TVT+ Y-t- + AY P+ F IK E ++EL K A R+ A 



An alignment of the GAS and GBS proteins is shown below: 
Identities = 386/427 (90%) , Positives = 404/427 (94%) 

MSIsmCKRPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGINYAH 6 0 
MS++KKRPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGI+YAH 
MSHMKKRPEVLSPAGTLEKLKVAIDYGADAVFVGGQAYGLRSRAGNFSMEELQEGIDYAH 6 0 

ARDAKVYVAANMVTHEGNELGAGPWFRELRDMGLDAVIVSDPALIVICATEAPGLEIHLS 120 
AR AKVYVAANMVTHEGNE+GAG WFR+LRDMGLDAVIVSDPALIVIC+TEAPGLEIHLS 



TQASSTWYETFEFWK MGLTRWLAREV ['IAELAEIRKRTDVEIEAFVHGAMCISYSGRC 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 




361 


Sbj ct: 


361 


Query: 





D+IENGVDSLKIEGRMKSIHYVSTVTNCYKAAV AYMESPEAF AIKE+LIDELWKVAQR 



ELATGFYY PTENEQLFGARRKIPQYKFVGEW+FD+A M ATIRQRNVIMEGDR+E Y 



L DA+G KIDRAPNPMELLTI+LF VK GDMIRACKEGLVNLYQ D 



GTSKTVR 427 
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SEQ ID 1094 (GBS385) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 69 (lane 3; MW 50kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 7; MW 75.7kDa). 

The GBS385-GST fusion product was purified (Figure 213, lane 7) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 312), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 339 

A DNA sequence (GBSx0369) was identified in S.agalactiae <SEQ ID 1097> which encodes the amino 
acid sequence <SEQ ID 1098>. This protein is predicted to be collagenase. Analysis of this protein 
sequence reveals the following: 

o N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2208 (Affirmative) < succ: 

bacterial membrane --- Certainty-0 . 0000 (Not Clear) < euccs. 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MEKIILTATAES1EQVKQLLAIGIDRIYVGEENYGLRLPHSFSDDELRHIAKLVHDAGKE 60 

M+K li T S + L+ G VGE+ YGLRL FS +++ + ++ H G + 

Sbjct: 1 MKKPELLVTPTSTADILPLIQAGATAFLVGEQRYGLRLAGEFSREDVTKAVEIAHKEGAK 60 

Query: 61 LTVACNALMHQEMMDNIKPFLELMKEINVDYLWGDAGVFYINKRDGYNFKLIYDTSVFV 120 

+ VANA+H + + + +L+EVDVGDV + +KL + T 
Sbjct: 61 VWAVmiFHNDKVGELGEYI^FIAEAGVDAAVFGDPAVLMAARESAPDLKLHWSTETTG 120 

Query: 121 TSSRQWFWGQHGAVETVLAREIPSEELFKIvlSENLZFPAEILVYGASVIHHSKRPLLQNY 180 

T+ N+WG+ GA +VLARE+ + + ++ EN E EI V+G + + SKR h+ NY 
Sbjct: 121 TNYYTOraTORKGAARSVLAREIjNMDSIVEI 180 

Query: 181 YNF---TH1TDEKTRERGLFI^PGDPESHYSIYEDKHGTHIFINNDINMMTKVTELVEH 237 

+ + + K +E G+FD + + ++ Y I+ED++GTHI ND+ ++ ++ EL++ 

Sbjct: 181 FEYQGKVMDIERKKKESGMFLHDK-ERDNKYPIFEDENGTHIMSPNDVCIIDELEELIDA 239 

Query: 238 HFTHWKLDGIYCPGDNFVAIAEIFVETARL-IENGTFTQDQAFLFDERIRKLHPKGRGLD 296 

+K+DG+ ++ + +++ E I> +EN + + + ERI + P R +D 
Sbjct: 240 GIDSFKIDGVLKMPEYLIEVTKI'IYREAIDLCVENRDEYEAKKEDWIERIESIQPVNRKID 299 

Query: 297 TGFY 300 

Sbjct: 300 TGFF 303 

A related GBS nucleic acid sequence <SEQ ID 10949> which encodes amino acid sequence <SEQ ID 
1095O was also identified. 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1099> which encodes the amino acid 
sequence <SEQ ID 1 100>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1716 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 245/308 (79%) , Positives = 273/308 (88%) 



Sbjct 
Query 
Sbjct 

Sbjct 
Sbjct 
Sbjct 



1 MEKIILTATAESIEQVKQLLAIGIDEIYVGEENYGLRLPHSFSDDSLREIAKIiVHDAGKE SO 

MEKII+TATAESIEQVK LLA G+DRIYVGE NYGLRLPH+FS DELR+ 1 AKLVHDAGKE 
1 MEKIIITATAESIEQVKALIAAGVDRIYVGEANYGLRLPHNFSYDELRQIAKLVHDAGKE 50 

61 LTVACNALMHQENMDNIKPFLELMKEINVOT^ 120 

LTVACNALMHQ+MMD IKPFL+LM EI VDYLWGDAGVFY+NKRDGYNFKLIYDTSVFV 
61 LTOACNALMHQDMMDQIKPFIiDLMIEIAX'DyLVVGDAGVFYVNKRDGYNFKLiyDTSVFV 120 



301 DFDPSTVK 308 

+FDP TVK 
301 EFDPKTVK 308 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 340 

A DNA sequence (GBSx0371) was identified in S.agalactiae <SEQ ID 1101> which encodes the amino 
acid sequence <SEQ ID 1102>. This protein is predicted to be cDNA EST yk542cl2.5 comes from this 
gene. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have a cleavable N-term signal seg. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD1SS22 GB:U75480 unknown [Streptococcus mutans] 
Identities = 69/152 (45%) , Positives = 101/152 (66%) , Gaps = 12/152 (7%) 

Query: 1 MSKLFKTLVISAASGARAA.YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYS 60 

MSK KT +1 A +GAAA&YFL+T KGK+ +K + + +YKENP+EYHQ A DK +EY 
Sbjct: 1 MSKFLKTAIIGAGTGAAAAYFLSTDKGKQFKK3CIHQTFTDYKENPKEYHQYARDIOTNEYK 60 
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Query: 61 NLAVDTFKDYKGKFESGELTTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKED 120 

++AV +FKDYK KFE+GELT ++I+S+VKEK+ + FAN ++Q K + T +K + 
Sbjct: 61 DVAVHSFKDYKDKFETGELTKDNI I SSVKEKASQAGKFANSKLSQVKDHLA- - QTVEKAE 118 

Query: 121 KAP ETKVEDIVIDYKENTEDKE 142 

4 + +V+DIVIDY+ + K+ 

Sbjct: 119 ASTNDAGIPLGEMKAQVDDIVIDYQAEEKTKK 150 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1 1 03> which encodes the amino acid 
sequence <SEQ ID 1104>. Analysis of this protein sequence reveals the following: 



Possible site: 26 
>>> Seems to have r 



INTEGRAL Likelihood = -1.81 Transmembrane 15 - 31 ( 14 - 



Final Results 

bacterial membrane Certainty=0 .1723 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9117> which encodes the amino acid sequence 
<SEQ ID 9118>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial outside --- Certainty= 0.300 (Affirmative) < suco 

bacterial membrane Certainty= 0 . 000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0 . 000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 69/140 (49%), Positives = 91/140 (64%), Gaps = 8/140 (5%) 

Query: 1 MSKLFKTLVISAASGAAAAYFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYS 60 

M+K FK LVI A SG AAAYFL+T+KGK L+ AEK Y YKE+P++YHQ AK+K SEYS 
Sbjct: 8 MNKSFKNLVIGAVSGVAAAYFLSTEKGKALKNRAEKAYQAYKESPDDYHQFAKEKGSEYS 67 

Query: 61 OTAVDTFKDYKGKFESGELTTEDIVSAVTCEKSGEVVDFANDFVNQAKSKFSD-EDTAKKE 119 

+LA DTF D K K SG+LT ED++ +K+K+ FV + K ++ E K++ 

Sbjct: 68 HLARDTFYDVKDKLASGDLTKEDMLDLLKDKT TAFVQKTKETLAEVEAKEKQD 120 

Query: 120 DKAPETKVEDIVIDYKENTE 139 

D + EDI4IDY E E 
Sbjct: 121 DVIIDLNEEDIIIDYTEQDE 140 

SEQ ID 1102 (GBS 164) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 30 (lane 4; MW 17.4kDa). 

The GBS164-His fusion product was purified (Figure 115A; see also Figure 200, lane 4) and used to 
immunise mice (lane 1+2+3 product; 20ug/mouse). The resulting antiserum was used for Western blot, 
FACS (Figure 115B), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is irnmunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 341 

A DNA sequence (GBSx0372) was identified in S.agalactiae <SEQ ID 1105> which encodes the a 
acid sequence <SEQ ID 1106>. Analysis of this protein sequence reveals the following: 
Possible site: 19 

>» Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood =-16.93 



Final Results 

bacterial membrane Certainty=0 . 7771 (Affirmative) < succ 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MIEIAVLIIAIAFVVLVI£ILFVLKKVSETIEETKQTIKOTTSDVNVTLYQTNEILAKAN 60 

M EIA+LI+AIAF VLV+ ++ +L+K+S+T++E++QT+K+LTSDVNVTLYQTNE+LAKAN 

Sbjct: 1 MWEIALLIVAIAFAvLVIYLILLLRKISDTVDESRQTLKILTSDVNVTLYQTNEIiLAKAN 60 

Query: 61 VLVDDWGCTSTIDPLFVAIADLSESVSDLNLQARHIGQKASSATSSVTKAGSALAIGKA 120 

VLV+DVNGKV TIDPLF AIADLS SVSDLN QAR+ G+K +T+4V KAG+A GK 

Sbjct: 61 VX1VEDWGKVETIDPLFTAIADLSVSVSDI1NRQARYFGKKTRKSTANVGKAGAA.YTPGKV 120 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1107> which encodes t 

30 sequence <SEQ ID 1 108>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -0.35 Transmembrane 18 - 34 ( 17 - 34) 

35 Final Results 

bacterial membrane --- Certainty=0 .1341 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:AAD15621 GB:U75480 unknown [Streptococcus mutans] 
Identities = 83/128 (64%) , Positives = 110/128 (85%) 

Query: 6 ISLMIIAI^FVALVIFLIIVLKKVSETIDEAKKTISVIjTSDVNVTIjHQTNDILAKANILV 65 
45 I+L+I+A+AF LVI+LI++L+K+S+T+DE+++T+ +LTSDVNVTL+QTN++LAKAN+LV 

Sbjct: 4 lALLIVAIAFAVLVIYLILLLRKISD'rVDESRQTLKILTSDVWTLYQTNELLAKANVLV 63 

Query: 66 EDVKGKVATIDPLFVAIADLSESLSDLKSQARHFGQKATNATGNVSKAGKLALVGKVASK 125 
EDVNGKV TIDPLF AIADLS S+SDLN QAR+FG+K +T NV KAG GKVASK 
50 Sbjct: 64 EDWGKVETIDPLFTAIADLSVSVSDLNRQARYFGKKTRKSTANVGKAGAAYTFGKVASK 123 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 92/131 (70%) , Positives = 116/131 (88%) 



Query: 1 MIEIAVLIIAIAFVVLVLGILE^/LKKVSETIEETKQTIK\^TSD , /NVTLYQTlffiILAKM 60 
60 ++ I+++IIA+AFV LV+ ++ VLKKVSETI+E K+TI vLTSDVNVTL+QTN+ILAKAN 
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Sbjct. 



Sbjct: 



3 LVGISLMIIALAFVALVIFLIIVLKOTSETIDEaKKTISVLTSDVWTLHQraDILAKflM 62 

61 VLVDDVNGKVSTIDPLFVAIADLSESVSDLMiQftHHIGQKftSSATSSVTKAGSALAIGKA 120 

+LV+DWGKV+TIDPLFVAIADLSES+SDLN QARH GQKA++AT +V+KAG +GK 
63 ILVEDWGKVATIDPLFVAIADLSESLSDljNSQAEHFGQKATNATGNVSKAGKriALVGKV" 122 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 342 

A DNA sequence (GBSx0373) was identified in S.agalactiae <SEQ ID 1109> which encodes the amino 
acid sequence <SEQ ID 1 1 1 0>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0462 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



e343 

A DNA sequence (GBSx0374) was identified in S.agalactiae <SEQ ID 1111> which encodes the amino 
acid sequence <SEQ ID 1 1 12>. This protein is predicted to be prolipoprotein diacylglyceryl transferase (lgt). 
Analysis of this protein sequence reveals the following: 

Possible site: 29 



3 have an uncleavab] 
Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



.e N-term signal seq 

Transmembrane 231 - 247 { 225 - 251) 

Transmembrane 39 - 105 ( 87 - 107) 

18 - 34 ( 13 - 36) 

46 - 62 ( 46 - 64) 



Final Results 

bacterial membrane Certainty=0. 4354 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 972 1> which encodes amino acid sequence <SEQ ID 9722> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC80171 GB:U75480 putative prolipoprotein diacylglycerol 
transferase [Streptococcus mutans] (ver 3) 
Identities = 184/257 (71%) , Positives = 226/257 (87%) 
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Sbjct: 1 MINPIAIKLGPLTIRWYSICIVTGLIIAWLTIR3APKKNIKSDDVLDFILIAFPLAIVG 60 

Query: 62 ARIYWIFEI^YYSKHPVEIIAIWNGGIAIYGGLITGAILLVIFSYRRLINPIDFLDIAA 121 

AR+YYVIF+W YY K+P EI IW+GGIAIYGGL+TGA++L IFSY R+I PIDFLD+AA 
Sbjct: 61 ARLYYVIFDWDYYLKNPSEIPVIWHGGIAIYGGLLTGALVLFIFSYYRMIKPIDFLDVAA 120 

Query: 122 PGVMIAQAIGRWGNFINQEAYGRAVKlsnjNYVPNFIKWQMYIDGAYRVPTFLYESriWNFIjG 181 

PGVM+AQ+IGRWGNF+NQEAYG+ V LNY+P+FI+ QMYIDG YR PTFLYESLWN LG 
Sbjct: 121 PGVM1AQSIGRWGNFWQEAYGKTVTQLNYLPDFIRKQMYIDGHYRTPTFLYESLWNLLG 180 

Query: 182 FVIIMSIRHRPRTLKQGEVACFYLVWYGCGRFIIEGMRTDSLYLAGLRVSQWLSVILVII 241 

F+IIM +R RP LK+GEVA FYL+WYG GRF+IEGMRTDSL A LRVSQWLSV+LV++ 
Sbjct: 181 FIIIMILRRRPNLLKEGEVaFFYLIlTO-GSGRFVIEGMRTDSLMFASLRVSQWLSVLLWV 240 

Query: 242 GIVMIIYRRREQHISYY 2S8 

G+++++ RRR I YY 
Sbjct: 241 GVILMVIRRRNHAIPYY 257 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1113> which encodes t 
sequence <SEQ ID 1 1 14>. Analysis of this protein sequence reveals the following: 



INTEGRAL Likelihood = -7 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -4 

INTEGRAL Likelihood = -4 

INTEGRAL Likelihood = -0 



Transmembrane 229 - 245 ( 222 - 

Transmembrane 45 - 61 ( 40 - 68) 

Transmembrane 17 - 33 ( 11 - 

Transmembrane 87 - 103 ( 86 - 

0 - 186 ( 170 - 



— Certainty=0. 3803 (Affirmative) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC80171 GB:U75480 putative prolipoprotein diacylglycerol 
transferase [Streptococcus mutans] (ver 3) 
Identities = 176/258 (68%), Positives = 217/258 (83%) 

Query: 1 MINPIALKCGPLAIHWYALCILSGLVLAVYLASKEAPKKGISSDAIFDFILIAFPLAIVG 60 

MINPIA+K GPL I WY++CI++GL+LAVYL +EAPKK I SD + DFILIAFPLAIVG 
Sbjot: 1 MINPIAIKLGPLTIRWYSICIVTGLILAVYLTIREAPKKNIKSDDVLDFILIAFPLAIVG 60 

Query: 61 ARIYWIFEWSYYVKHLDEIIAIWNGGIAIYGGLITGALVLLAYCYNKVLNPIHFLDIAA 120 

AR+YYVIF+W YY+K+ EI IW+GGIAIYGGL+TGALVL + Y +++ PI FLD+AA 
Sbjct: 61 ARLYYVIFDWDYYLKNPSEIPVI11HGGIAIYGGLLTGALVLFIFSYYRMIKPIDFLDVAA 120 

Query: 121 PSVMVAQAIGRWGNFINQEAYGKAVSQLNYLPSFIQKQMFIEGSYRIPTFLYESLWNLLG 180 

P VM+AQ+IGRWGNF+NQEAYGK V+QLNYLP FI+KQM+I+G YR PTFLYESLWNLLG 
Sbjct: 121 PGWMQSIGRWGNFVNQEAYGKTOTQLNYLPDFIRKQI'IYIDGHYRTPTFLYESLWNLLG 180 

Query: 181 FVIIMMWRRKPKSLLDGEIFAFYLIWYGSGRLVIEGMRTDSLMFLGIRISQYVSALLIII 240 

F+IIM+ RR+P L +GE+ FYLIWYGSGR VIEGMRTDSLMF +R+SQ++S LL+++ 
Sbjct: 181 FIIIMILRRRPNLLKEGEVAFFYLIVJYGSGRFVIEGMRTDSLMFASLRVSQWLSVLLWV 240 

Query: 241 GLIFVIKRRRQKGISYYQ 258 

G+I ++ RRR I YYQ 
Sbjct: 241 GVILMVIRRRNHAIPYYQ 258 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 176/257 (68%) , Positives = 221/257 (85%) 

Query: 2 MINPVAIRLGPFSIRVffAICIVSGMLl^VYIAMKEAPRKNIKSDDILDFILMAFPLSIVG 61 

MINP+A++ GP +1 WYA+CI+SG++LAVYLA KEAP+K I SD I DFIL+AFPL+IVG 
Sbjct:, 1 MINPIALKCGPLAIHWYALCILSGLVIAVYIiASKEAPKKGISSDAIFDFILIAFPLAIVG 60 



WO 02/34771 



PCT/GB01/04789 



-435- 



Query: 62 ARIYYVIFEmYYSKHPVEIIAIWNGGIAIYGGLITGAILLVIFSYRRLINPIDFLDIAA 121 

ARIYYVIFEW+YY KH EIIAIWNGGIAIYGGLITGA++L+ + Y +++NPI FLDIAA 
Sbjct: 61 ARIYWIFEWSYYVI<HLDEIIAIWNGGIAIYGGLITGALVIiLAYCYNKVLNPIHFLDIAA 120 

Sbjct: 

Query: 182 FVIIMSIRHRPRTLKQGEVACFYLVWYGCGRFIIEGMRTDSLYIAGLRVSQWLSVIIjVII 241 

FVIIM R +P++L GE+ FYL+WYG GR +IEGMRTDSL G+R+SQ+4S +L+II 
Sbjct: 181 FVIIMMWRRKPKSLLDGEIFAFYLIWYGSGRLVIEGMRTDSLMFLGIRISQYVSALLII1 240 

Query: 242 GIVMI IYRRREQHISYY 258 

G++ +1 RRR++ ISYY 
Sbjct: 241 GLIFVIKRRRQKGISYY 257 



A related GBS gene <SEQ ID 8557> and protein <SEQ ID 8558> were also identified. Analysis of tt 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 

McG: Discrim Score: 2.45 

GvH: Signal Score (-7.5): -2.9 
Possible site: 39 

»> Seems to have an uncleavable N-term signal seq 

ALOM program count: 3 value: -8.39 threshold: 0.0 

INTEGRAL Likelihood = -8.39 Transmembrane 209 - 225 ( 203 - 229) 
INTEGRAL Likelihood = -7.64 Transmembrane 67 - 83 ( 65 - 85) 
INTEGRAL Likelihood = -1.86 Transmembrane 24 - 40 ( 24 - 42) 
PERIPHERAL Likelihood =0.79 92 
modified ALOM score: 2.18 



*** Reasoning Step: 3 

Final Results 

35 bacterial membrane Certainty=0 .4354 (Affirmative) < suoc; 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

40 ORF01400 (238 - 1008 of 1308) 

SP|P72482|LGT_STRMU(1 - 257 of 259) PROLIPOPROTEIN DIACYLGLYCERYL TRANSFERASE (EC 2.4.99.- 
). Gp|4583534]gb|AAC80171.3| |U75480 putative prolipoprotein diacylglycerol transferase 
{Streptococcus mutans} PIR|T11569|T11569 prolipoprotein diacylglyceryl transferase (EC 
2.4.99.-) - Streptococcus mutans 

45 %Match =46.9 

%Identity =71.6 %Similarity =89.5 

Matches = 184 Mismatches = 27 Conservative Sub.s = 46 



198 228 258 288 318 348 378 408 

50 WGLMLPRLLRIV*HI*LvRTRSMMINPVAIRLGPFSIRWYAICIVSGMLIjAVYLAMKEAPRlQJIKSDDILDFILMAFPLS 
lll|:|l=lll==llll=lll|:|==lllll - I I I = I I I I I I I = i I I I I : I I I I = 
MINPIAIKLGPLTIRWYSICIVTGLILAVYLTIREAPKKNIKSDDVLDFILIAFPLA 
10 20 30 40 50 

55 438 468 498 528 558 588 618 648 

IVGARIYYVIFEWAYYSKHPVEIIAIWNGGIAIYGGLITGAILLVIFSYRRLINPIDFLDIAAPGVMIAQAIGRWGNFIN 
llllhllllhl II l-'l II Mil 1 = 1 I I I I I I : I I I I I I : I I : I I I I I I I = I 

IVGARLYWIFDWDYYLKNPSEIPVIWHGGIAIYGGLLTGALVLFIFSYYRMIKPIDFLDVAAPGVMLAQSIGRWGNFVN 
70 80 90 100 110 120 130 

60 

678 708 738 768 798 828 858 888 

QEAYGRATOEJWPNFIKNQMYIIXSAYRVPTFLYESLWNFLGFVIIMSIRHRPRTLKQGEVACFYLVWYGCGRFIIEGM 
llllh I IH:|:||: llllll II 1111111111 = 111 = 111 =1 II 11 = 1111 |ll = ||| ||| = |||| 
QEAYGKTVTQLNYLPDFIRKQMYIDGHYRTPTFLYESLVffilLLGFIIIMIIjRRRPNLLKEGEVAFFYLIWYGSGRFVIEGM 
65 150 ISO 170 180 190 200 210 
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918 948 978 1008 1038 1068 1098 1128 

RTDSLYIiRGLRVSQWLSVILVIIGlVMIIYRRREQHlSYY*TEEVL**KLLy*LLPLRLLP*F*EYFSP*KKYQKRLRKP 
lllll =1 llllllll|:||:=|::=:= III = I II 
RTDSLMFASLRVSQWLSVLIiVWGVILMVIRRRNHMPYYQC 



230 



140 



250 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 344 

A DNA sequence (GBSx0375) was identified in S.agalactiae <SEQ ID 1115> which 
acid sequence <SEQ ID 1116>. Analysis of this protein sequence reveals the following: 



:erminal signal sequence 



- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



■ Certainty=0 .2817 (Affirmative) 
Certainty=0. 0000 (Not Clear) < i 
Certainty=0.0000(Not Clear) < : 



The protein has homology with the following sequences in the GENPEPT database: 



Sbjct: 



Sbjct: 
Query: 



Sbjct: 

Sbjct: 
Query: 
Sbj< 



M+VTV+MLVD++KL+VIYGD+ LLSK ITT+DISRPGLEMTGYFDYY+PERLQL+GMKEW 



IPAEILRHLLEIRGVGI ID: 
V+AKDEETLWGEPAEILRHLLEIRGVGIID- 
181 VFAKDEETLWGEPAEILRHLDEIRGVGIIDi 



iSSQVQLAIYLENFETGKV 240 
SLYGASAVKDSSQVQLAIYLEN+E+GKV 
SLYGflSAVKDSSQVQLAIYLENYESGKV 240 



241 FDRLGNGNEEIELSGVKVPRIRIPVKTGRNVSWIEAAAMNHRAKQMGFDATQTFEDRLT 300 

FDRLGNGNEE+ELSGVK+PR+RIPV+TGRM+SWIEAAAMN+RAKQMGFDAT+TFE+RLT 
241 FDRLGNGNEELELSGVKIPRLRIPVQTGRM^SWIEAAAMNYRAKQMGFDATKTFEERLT 300 

301 HLISQNEVN 309 

LI++NE N 
301 QLITKNEGN 3 OS 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1117> which encodes the £ 
sequence <SEQ ID 1 1 1 8>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2391 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 255/309 (82%) , Positives = 288/309 (92%) 
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Query: 1 ^VTVQMLVDRLKLNViyGDEHLLSKRITTADISRPGLEMTGYFD'/YAPERLQLVGMKEW 60 

M VTV+MLV ++KL+V+Y ++LLSK ITT+DISRPGLEMTGYFDYYAPERLQL GMKEW 

Sbjat: 32 MTVTVKMLVQKVKLDVVYATDNLLSKEITTSDISRPGLEMTGYFDYYAPERLQLFGMKEW 91 

Query: SI SYLMAMTGHNRYQVLREMFQKETPAIVVARDLEIPEEKYEAAKDTGIAILQSKAPTSRLS 120 

SYL MT HNRY VL+EMF+K+TPA+W+R+L 1P+EM +AAK+ GI++L S+ TSRL+ 

Sbjct: 92 SYLTQMTSHNRYSVLKEMFKKDTPAWVSRNLAI PKEKVQAAKEEGI SLLSSRVSTSRLA 151 

10 Query: 121 GEVSWYLDSCLAERTSVHGVLMDIYGMGVLIQGDSGIC-KSETGLELVKRGHRLVADDRVD 180 

GE+S++LD+ LAERTSVHGVLMDIYGMGVLIQGDSGIGKSETGLELVKRGHRLVADDRVD 

Sbjct: 152 GEMSYFLDASLAERTSVHG^MDIYGMGVLIQGDSGIGKSETGLELVKRGHRLVADDRVD 211 



Query: 181 VYAKDEETLWGEPAEILRHLLEIRGVGIIDIMSLYGASAVKDSSQVQLAIYLENFETGKV 240 

VYAKDEETLWGEPAEILRHLI.EIRGVGIID+MSLYGASAVKDSSQVQLAIYLENFE GKV 
Sbjct: 212 VYAKDEETLWGEPAEILRHLLEIRGVGIIDVMS^YGASAVKDSSQVQLAIYLENFEAGKV 271 

Query: 241 FDRLGNGNEEIELSGVKVPR1RIPVIOTGRNVSWIEAAAMNHRAKQMGFDATQTFEDRLT 300 
FDRLGNGNEEI SGV++PRIRIPVKTGRNVSWIEAAAMNHRAK+MGFDAT+TFEDRLT 

Query: 301 HLISQNEVN 309 

LI++NEV+ 
Sbjct: 332 QLITKNEVS 340 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 345 

A DNA sequence (GBSx0376) was identified in S.agalactiae <SEQ ID 1119> which encodes the amino 
acid sequence <SEQ ID 1120>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm — Certainty=0 . 183S (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9719> which encodes amino acid sequence <SEQ ID 9720> 
40 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 346 

A DNA sequence (GBSx0377) was identified in S.agalactiae <SEQ ID 1121> which encodes the amino 
acid sequence <SEQ ID 1 122>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have an uncleavable N-term signal seq 
50 INTEGRAL Likelihood = -4.88 Transmembrane 35 - 51 ( 31 - 59) 

Final Results 

bacterial membrane --- Certainty=0. 2954 (Affirmative) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC67275 GB : AF017113 YvlC [Bacillus subtilis] 
Identities = 21/63 (33%) , Positives = 36/63 (56%) , Gaps = 2/63 (3%) 



SSFYKQRKGKLVCGWAGLADKYNKDLftLSR^IALILYFTKF- -GLLLYILLAVFLPYK SO 
+ Y+ K K + GV+ GLA+ +NWD +L RV+ ++ T LL+YI+ +P + 

NKLYRSEKNKKIAGVIGGLAEYFNWEASLLRVITVILAIMTSVLPVLLIYIIWIFIVPSE 61 



15 A related DNA sequence was identified in S. pyogenes <SEQ ID 1123> which encodes the amino acid 
sequence <SEQ ID 1 124>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.26 Transmembrane 39 - 55 ( 31 - 61) 

20 

Final Results 

bacterial membrane Certainty=0 .3102 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 60/90 (66%), Positives - 77/90 (84%), Gaps - 3/90 (3%) 

Query: 1 MKSSFYKQRKGKLVCGWAGIOTKXNfTOIiALSRVLIALIIiYFTKFGLLLYILLAVFLPYK SO 
30 +++ FYKQRK +LV GV+AGLADKY MDLAL+RVL AL++Y T FG+LLYILIA+FLPYK 

Sbjct: 1 VETKFYKQRKtJRLVAGVIAGIJ^KYGWDLAIJVRVLAALLIYGTGFGVLLYILIjAIFLPYK SO 

Query: 61 EDIIETR-RQGPRRRKDAEPV- -DDDGWFW 87 
ED++E R +GPRRRKDA+ + ++DGWFW 
35 Sbjct: 61 EDLLEERYGRGPRRRKDADVLNEEEDGWFW 90 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 347 

40 A DNA sequence (GBSx037S) was identified in S.agalactiae <SEQ ID 1125> which encodes the amino 
acid sequence <SEQ ID 1 126>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 3577 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



50 A related GBS nucleic acid sequence <SEQ ID 9717> which encodes amino acid sequence <SEQ ID 971 8> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04250 GB:AP001508 unknown conserved protein [Bacillus halodurans] 
Identities = 379/729 (51%) , Positives = 515/729 (69%) , Gaps = 25/729 (3%) 

55 

Query: 29 ENLNITQIAIDLGIKASQI3KVLELTDEGNTIPFIARYRKEMIGNLDEVQIKSIIDLDKS 88 



WO 02/34771 



PCT/GB01/04789 



Sbjct: 

Query: 89 MTALSDRKTTVLAKIEEQGI<LTQELKI<AIEEATKLADVEELYLPYKEKRRTKATIAREAG 148 

L +RK V+ +EEQGKLT E KK +E+A KL +VE+LY PYK+KRRT+AT+A+E G 
Sbjct: 68 ANQLHERKEEVIRLVEEQGKLTDEWKKWEQAQKLQEVEDLYRPYKQKRRTRRTVAKEKG 127 

Query: 149 LFPLARLI - -LQNKDNLEEEAQNYLTDGFETTT- -KALSGAVDILIEAFSEDNKLRSWTY 204 

L PLA + L + +EA+ YL+ E T L GA DI+ E ++D LR 

Sbjct: 128 LEPLAEWLFSLPRDGDPLQEAEVYLSVEHELTKVEDVLQGAQDIIAEWIADDADLRKRIR 187 

Query: 205 IffilW^SSITAWKDESLDEKQVFKIYYDFSEKISKLHGYQVLALNRGEKMGVLKVNFEH 264 

+ 4 S+ A VK E LDEK V+++YYD+ E + L ++ LALNRGEK VL+V 
Sbjct: 188 SLGFKEGSVIAKVKKEELDEKGVYEMYfDYEEPWTLVPHRTIALNRGEKEDVLRVTIRF 247 



Query: 265 NLEKMFRF FAVRFKETS-QYIDDLIVQTVKKKIVPAMERRIRTELSEGAEDGAISL 319 

++++ F RF + Y+ I K+ I P++ER IR EL+E AE+ AI + 

Sbjct: 248 PVDRIIEMSEKTFIRRFGSPAVPYVKAAIEDGYKRLIEPSIEREIRHELTEKAEEQAIHI 307 

Query: 320 FSENLR1&LLVSPLKGKMVLGFDPAFRTGAKLAVVDQTGKLMTTQV1YPVPPANQAKIEQ 379 

F+ENLR+LLL P+RGK+VLG DPA+RTG KLA+VD+TGK++ QVIYP PP N+ + 
Sbjct: 308 FAENLRSLLLQPPIKGKVVLGLDPAYRTGCKLAIVDETGKVLDIQVIYPTPPKNE--VAA 365 

Query: 380 SKIELAKLIKEFNIEIIAIGNGTASRESEAFVA3VLQDFPD-VSYVIVNESGASVYSASE 438 

+K + KLI ++ +E+IAIGNGTASRESE F+A++++D P + Y+IVNE+GASVYSASE 
Sbjct: 366 AKKIVKKLIADYGVEMIAIGNGTASRESEQFIADLIKDLPQTIYYLIVNEAGASVYSASE 425 

Query: 439 IJ^HEFPDLTVEKRSAISIARRLQDPLAELVKIDPKSIGVGQYQHDVSQKKIAEMLDFW 498 

+ R EFPDL VE+RSA+SIARRLQDPLAELVKIDPKS+GVGQYQHDVSQK+L E+L FW 
Sbjct: 426 IGREEFPDLQVKERSAVSIARRLQDPLAELVKIDPKSVGVGQYQHDVSQKRIaNESLTFW 485 

Query: 499 ErVWQVGVmOTASPALLAHVSGLNKTISENIVKYREENGQIKSRAEIKKVPRLGAKAF 558 

ETVVNQVGVNVNTASP+LL +V+GL+KT+++NIVK REE G+ +RA++K +PRLGAK + 
Sbjct: 486 ETVTOQVGVMITOTASPSLLQYVAGLSKTVAKNIVKKREEAGRFTARAQLKDIPRLGAKTY 545 



Query: 616 AESIGVGQETLKDIIEDLLKPGRDLFJ3DFEAPVLPJJDVLDVSDLKVGQELQGTVRNVVDF 675 

A ++ VG TLKD+I+ L++P RD RD+ P+L+ DVL + DL G ELQGTVRNWDF 
Sbjct: 606 AATLDVGVPTLKDMIDALIRPTPJDPRDEVAKPLLKQDVLQLEDLLPGMELQGTVRNVVI3F 665 

Query: 676 GAFVDIGVHEDGLIHQSRLIKRKRDKKTRKMPPLQHPSKYLSVGDIVTWVVEVDAERSR 735 

G FVDIGV +DGL+H S+h R ++HP + ++VG+IVTWV +VD ++ R 

Sbjct: 666 GVFVDIGVKQDGLVHISKLANRY IKHPLEWTVGEIVTVWVEDVDIKKGR 715 

Query: 736 IGLSLIKPD 744 

I L++++P+ 
Sbjct: 716 IALTMLRPE 724 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1127> which encodes the amino acid 
sequence <SEQ ID 1128>. Analysis of this protein sequence reveals the following: 



• Final Results 

bacterial cytoplasm Certainty=0. 2207 (Affirmative) ■ 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0. 0000 {Not Clear) < : 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 532/716 (74%), Positives = 619/716 (86%), Gaps = 10/716 (1%) 
Query: 28 ^IlMLNITQIAIDLGIKASQIEKVIJELTDEGNTrPFIARYRKEMTGNLDEVQrKSIIDLDK 87 
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Sbjct: : 

Query: 83 SMTALSDRKTTVLAKIEEQGICLTQELIQ<AIEEATKLADVEELYLPYKEKRRTKATIftREA 147 

S+T L++RK T+LAKIEEQGKLT +L+ +IE KLAD+EELYLPYKEKRRTKATIAREA 
Sbjct: 61 SLTTLNERKATILAKIEEQt3KLTDQLRTSIEATE!OM)LEELYLjPYKEICRRTKATiaREA 120 

Query: 148 GLFPIARLILQNKDNLEEEAQNYLTDGPETTTKALSGAVDILIERFSEDNKLRSWTYHEI 207 

GLFPLARLILQN NLE A+ ++T+GP + +AL+GAVDIL+EA SED KLRSWTYNEI 
Sbjct: 121 GLFPLARLILQNAQNLETAAEPFVTEGFASPQEAIAGAVDILVEftMSEDAKLRSWTYNEI 180 

Query: 208 WIWSSITAVVKDESLDEKQVFKIYYDFSEKISKLHGYQVlAIiTOGEKMGVLKVNFEHWLE 267 

W YS + + +KDE LDEK+VF+IYYDFS+4+S + GY+ IALNRGEK+G+LKV+FEHNLE 
Sbjct: 181 WQYSRLVSTLKDEQLDEKKVFQIYYDFSDQVSNMQGYRTIAIMRGEKLGILKVSFEHNLE 240 

Query: 268 KMFRFFAVRFKETSQY1DDLIVQTVKKKIVPAMERRIRTELSEGAEDGAISLFSENLRNL 327 

KM RFF+VRFKET+ Y1+++I QT+KKKIVPAMERR+R+ELS+ AEDGAI LFSENLR+L 
Sbjct: 241 KMQRFFSVRFKETNPYIEEVINQTIKKKIVPAMERRVRSELSDAAEDGAIHLFSENLRHL 300 

Query: 328 LLVSPLKGKMVLGFDPAFRTGAKLAVVDQTGKLMTTQVIYPVPPAnQAKIEQSKIELAKL 387 

LLVSPLKGKMVLGFDPAFRTGAKLA+VDQTGKL+'TTQVIYPV PA+Q KI+ +K L +L 
Sbjct: 301 LLVSPLKGKMVLGFDPAFRTGAKLAIVDQTGKLLTTQVIYPVAPASQTKIQAAKETLTQL 360 

Query: 388 IKEFNIEIIAIGNGTASRESEAFVAEVLQDFPDVSYVIVNESGASVYSASEIARHEFPDL 447 

1+ + 1+IIAIGNGTASRESEAFVA4VL+DFP+ SYVIVNESGASVYSASELARHEFPDL 
Sbjct: 361 IETYQIDIIAIGNGTASRESEAFVADVLKDFPNTSYVIVNESGASVYSASELARHEFPDL 420 

Query: 448 TWKRSAISIARRLQDPLAELWIDPKSIGVGQYQHDVSQKKLAENLDFVVETVVNQVGV 507 

TVEKRSAISIARRLQDPLAELVKIDPKSIGVGQYQHDVSQKKL+ENL FW+TWNQVGV 
Sbjct: 421 TVEKRSAISIAMU.QDPLAELVKI3PKSIGVGQYQHDVSQKKliSEKLGFWDTVVSIQVGV 480 

Query: 508 NVOTASPALLAHVSGLNKTISENIVKYREENGQIKSRAEIKKVPRLGAKAFEQAAGFLRI 567 

NVNTASP+LLAHVSGLNKTISEMIVKYREENG + SRA+ 1 KKVPRLGAKAFEQAAGFLRI 
Sbjct: 481 NVKTASPSLLAHVSGMKTISENIVKYREENGALTSRADIJCKVPRI^RKAFEQAAGFLRI 540 

Query: 568 PNAKNFLDNTGVHPESYEAVKKLLDQLTIKELDDLAKEKLQNLDLIATAESIGVGQETLK 627 

P AKN LDNTGVHPESY AVK+L L I++LDD AK L + + AE++ +GQETLK 
Sbjct: 541 PGSAKNILDOTGVHPESYPAVKELFCT1GIQDLDDAAKATLAAVQVPQMAETIAIGQETLK 600 

Query: 628 DIIEDLLKPGRDLRDDFEAPVLFJIDVLDVSDLKVGQELCGTVRNVVDFGAFVDIGVHEDG 687 

DII DLLKPGRDLRDDFEAP+LR D+LD+ DL++GQ+L+GTVRNWDFGAFVDIGVHEDG 
Sbjct: 601 DIIADLLKPGRDLRDDFEAPILRQDILDLKDBEIGQKLEGTVRNVVDFGAFVDIGVHEDG 560 

Query: 688 LIHQSRLIKRKRDKKTRKMPPLQHPSKYLSVGDIVTVWWEVDAERSRIGLSLIKP 743 

LIH S + K + HPS+ +SVGD+VTVWV ++D +R ++ LSL+ P 

Sbjct: 661 LIHISEMSKTF yiWPSQWSVGDbVTVWSKIDLDRHKVNLSIjIiPP 706 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 348 

A DNA sequence (GBSx0379) was identified in S.agalactiae <SEQ ID 1129> which encodes the amino 
acid sequence <SEQ ID 1130>. This protein is predicted to be N5,N10-methylenetetrahydromethanopterin 
reductase homolog. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4864 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



WO 02/34771 



PCT/GB01/04789 





45 


Sbjct: 


1 


Query: 


105 






Query: 


165 


Sbjct: 


121 


Query: 


225 


Sbjct: 


181 




285 


Sbjct: 


240 



iGP:AAB94650 GB:U96107 N5,N10-methylenetetrahydromethanopterin 
reductase homolog [Staphylococcus carnosus] 
Identities = 164/300 (54%), Positives = 217/300 (71%), Gaps = 1/300 (0%) 



D+AVS P VLAA A T I+LSSAVT+LSS+DP+ VY++F+T+DA+SN 
MYGLGEHHRSDYAVSDPVTVIJUmASLTQRIKLSSAVTVLSSDDPVCVYERFATIjDAVSN S 0 



GRAEIM GRGSFIESFPLFGYDL DYD LF EK+++L IN 



+YPRA+Q ++PIW+ATGG +S+IR AE GLPI YA IGGNPK F+ + 



:c R +W T+ 



GA+FVGSPE VA K+I ++E h L+RFMLH+PVGSMPH+ ++ A1KLYGK V PI+ YF 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 349 

A DNA sequence (GBSx0380) was identified in S.agalactiae <SEQ ID 1131> which encodes the amino 
acid sequence <SEQ ID 1132>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1310 (Af f irmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9715> which encodes amino acid sequence <SEQ ID 9716> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1133> which encodes the amino acid 
sequence <SEQ ID 1 134>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0915 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 20/40 (50%) , Positives = 27/40 (S7%) , Gaps = 3/40 (7%) 



MAITHK+ D+LE M A FA +P 
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: 1 MAITHKKNDELEKMLAGFAS I PSFBKPLEVNTDGKLATKE 40 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 350 

A DNA sequence (GBSx0381) was identified in S.agalaetiae <SEQ ID 1135> which encodes the amino 
acid sequence <SEQ ID 1 136>. Analysis of this protein sequence reveals the following: 



d N-terminal signal sequence 

- - - Certainty=0 . 1453 (Af f : 



- Final Results 

bacterial cytoplas 

bacterial membrane — Certainty=0. 0000 (Not Clear) ■ 

bacterial outside — Certainty=0 . 0000 (Not Clear) ■ 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



20 Example 351 

A DNA sequence (GBSx0382) was identified in S.agalaetiae <SEQ ID 1137> which 
acid sequence <SEQ ID 1 138>. Analysis of this protein sequence reveals the following: 

Possible site: 37 





>» Seems to have an uncleavable N- 


term signal seq 










25 


INTEGRAL 


Likelihood =- 


11.15 


Transmembrane 


216 


232 


210 


240) 




INTEGRAL 


Likelihood = 


-9.18 


Transmembrane 


15 


31 


10 






INTEGRAL 


Likelihood = 


-9.02 


Transmembrane 


283 


299 


276 


299) 




INTEGRAL 


Likelihood = 




Transmembrane 


128 




119 


150) 




INTEGRAL 


Likelihood = 


-4.62 


Transmembrane 


243 


259 


237 


265) 


30 


INTEGRAL 


Likelihood = 


-2.44 


Transmembrane 


65 


81 


65 


81) 




INTEGRAL 


Likelihood = 


-2.44 


Transmembrane 


94 


110 


93 


111) 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Certainty=0 . 5458 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12119 GB:Z99105 ycgR [Bacillus subtilis] 
Identities = 141/283 (49%) , Positives = 198/283 (69%) , Gaps = 3/283 (1%) 

Query: 10 S VLQWFAI FISH IEALPFVLLGTILSGI IEVFITPDI VNKFLPKNKFLRVLFGTFVGFV 69 

S LQ +IFISI+IEA+PF+L+G ILSGII++F++ +++ + +PKN+FL VLFG G + 
Sbjct: 6 SFLQLNSIFISILIEAIPFILIGVILSGIICMFVSEEMIARIMPKNRFLAVLFGALAGVL 65 

Query: 70 FPSCECGIIPIINRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNSIRFLILRFVG 129 

FP+CECGIIPI R L K VP + V F+ TAPIINPlvLF+TY AFGN + R 
Sbjct: 66 FPACECGIIPITRRLLLKGVPLHAGVAFMLTAPIINPIVLFSTYIAFGNRWSVVFYRGGL 125 

Query. 130 ATIVAIALGVMLAFLVDDNILKEDAKPTHFHDYSDKKWYQKIFLALAHAIDEFFDTGRYL 189 

A V4-+ +GV+L++ DN L + +P H H + QK+ L HAIDEFF G+YL 

Sbjct: 126 ALAVSLIIGVILSYQFKDNQLLKPDEPGHHHHHHGTL-LQKLGGTLRHAIDEFFSVGKYL 184 

Query: 190 VFGTLIASAMQIYLPTRVLTTIGHSPITAILVIMiLAFILSLCSEADAFIGASLLSTFGI 249 
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+ G IA+AMQ Y+ T L IG + +++ LVMM LAF+LSLCSE DAFI +S STF + 
Sbjot: 185 riGAFIAAAMQTYVKTSTLLAIGQNDVSSSLVMMGLAFVLSLCSEVDAFIASSFSSTFSL 244 

Query: 250 APVMAFLLIGPMID1KNLMMMVNSFKTRF1VQFISVSSLIIII 292 
5 ++AFL+ G M+DIKNL+MM+ +FK RK+ F+ ++ 

Sbjct: 245 GSLIAFLVFGAMVDIKNLLMMLAAFKKRFV- -FLLITYIWIV 285 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1139> which encodes the amino acid 
sequence <SEQ ID 1 140>. Analysis of this protein sequence reveals the following: 



Possible site: 25 
Seems to have an uncleavable N-term signal seq 

Likelihood = 

Likelihood = 

Likelihood = 

Likelihood = 

Likelihood = 

Likelihood = 

Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Transmembrane 216 - 232 ( 211 - 237; 

Transmembrane 283 - 299 ( 276 - 299| 

Transmembrane 128 - 144 ( 119 - 150 

Transmembrane 15 - 31 ( 10 - 39 

Transmembrane 243 - 259 ( 237 - 265 

Transmembrane 65 - Bl ( 65 - 81 

94 - 110 ( 93 - 111] 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4970 (Affirmative) ■ 
■ Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



25 The protein has homology with the following sequences in the databases: 

>GP:CAB12119 GB:Z99105 ycgR [Bacillus subtilis] 
Identities = 143/288 (49%), Positives = 196/288 (67%), Gaps » 1/288 (0%) 

Query: 10 SVLQWFAIFMSI I IEALPFVLLGTILSGCIEVFVTPELVQKYLPKQKCLRILFGTFVGFV 69 
30 S LQ +IF+SI+IEA+PF+L+G ILSG I++FV+ E++ + +PK + L +LFG G + 

Sbjct: 6 SFLQLNSIFISILIEAIPFILIGVILSGIIQMFVSEEMIARIMPKNRFLAVLFGALAGVL 65 

Query: 70 FPSCECGIIPIINRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNSLRFLILRLVG 129 
FP+CECGIIPI R L K VP + V F+ TAPIINPIVLF+TY AFGN + R 
35 Sbjct: 66 FPACECGIIPITRRLLLKGVPLHAGVAFMLTAPIINPIVLFSTYIAFGNRWSWFYRGGL 125 



Query: 130 ARLVAITLGVMLAFIVDDNILKDNAQPVHFHDYSHSSLPKRIYLALVHAIDEFFDTGRYL 189 

A V++ +GV+L++ DN L +P H H + H +L +++ L HAIDEFF G+YL 
Sbjct: 126 ALAVSLIIGVILSYQFKDNQLLKPDEPGH-HHHHHGTLLQKLGGTLRHAIDEFFSVGKYL 184 

Query: 190 VFGTLIASAMQIYVPTRVLTTIGHNPLTAILIMMLMAFILSLCSEADAFIGASLLSTFGV 249 

+ G IA+AMQ YV T L IG N +++ L+MM +AF+LSLCSE DAFI +S STF + 
Sbjct: 185 IIGAFIAAAMQTYVKTSTLLAIGQNDVSSSLVMMGLP.FVLSLCSEVDAFIASSFSSTFSL 244 

Query: 250 APVLAFLLIGPMVDIKNLMM^mAFKGRFIVQFTGVSvLMIAVYCLLV 297 

++AFL+ G MVDIKNL+MM+ AFK RF+ I V+++ LLV 
Sbjct: 245 GSLIAFLVFGAMVDIKNLLMMLARFKKRFVFLLITYIVVIVIAGSLLV 292 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 248/300 (82%) , Positives = 278/300 (92%) 



55 



Query: 1 MDIFNQLPDSVLQWFAIFISIIIEALPFVLLGTILSGIIEVFITPDIVNKFLPKNKFLRV 60 

M +F+ LP SVLQWFAIF+SI I IEALPFVLLGTILSG IEVF+TP++V K+LPK K LR+ 
Sbjct: 1 MSLFSNLPPSVLQWFAIFMSIIIEALPFVLLGTILSGCIEVFVTPELVQKYLPKQKCLRI 60 

Query: 61 LFGTFVGFVFPSCECGIIPIINRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNSI 120 

LFGTFVGFVFPSCECGIIPIINRFLEKKVPSYTAVPFLATAPIINPIVLFATYSAFGNS+ 
Sbjct: 61 LFGTFVGFVFPSCECGIIPIINRFLEK507PSYTAVPFLATAPIINPIVLFATYSAFGNSL 120 

Query: 121 RFLILRFVGATIVAIALGVMIAFLVDDNILKEDAKPTHFHDYSDKKMYQKIFLALAHAID 180 

RFLILR VGA +VAI LGVMLAF+VDDNILK++A+P HFHDYS + ++I+LAL HMD 
Sbjct: 121 RFLILRLVGAALVAITLGVMLAFIVDDNILKDKAQPVH7HDYSHESLPKRIYLALVHAID 180 



Query: 181 EFFDTGRYLVFGTLIASAMQIYLPTRVLTriGHSPITAILVMMLLAFILSLCSEADAFIG 240 
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EFFDTGRYLVFGTLIASAMQIY+PTRVLTTIGH+P+TAIL+MML+AFILSLCSEADAFIG 
Sbjct: 181 EFFDTGRYLVFGTLIASAMQIYVPTRVLTTIGHNPLTAIL1MMLMRF1LSLCSEADAFIG 240 

Query: 241 ASLLSTFGIAPVMAFLLIGPMIDIKNIMMVNSFKTRFIVQFISVSSLIIIIYCLFVGVI 300 
5 ASLLSTFG-hAPV+AFLLIGPM+DIKNLMMMV +FK RFIVQFI VS L+I +YCL VGV+ 

Sbjct: 241 ASLLSTFGVAPVLAFLLIGPMVDIKNLMMMVKAFKGRFIVQFIGVSVLMIAVYCLLVGVL 300 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 352 

A DNA sequence (GBSx0383) was identified in S.agalactiae <SEQ ID 1141> which encodes the amino 
acid sequence <SEQ ID 1142>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4703 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 353 

A DNA sequence (GBSx0384) was identified in S.agalactiae <SEQ ID 1143> which encodes the amino 
acid sequence <SEQ ID 1144>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.44 Transmembrane 45 - 61 ( 39 - 65) 
INTEGRAL Likelihood = -8.12 Transmembrane 83 - 99 ( 77 - 101) 
INTEGRAL Likelihood = -0.00 Transmembrane 2 - 18 ( 1-19) 

Final Results 

bacterial membrane --- Certainty=0 .4376 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8559> which encodes amino acid sequence <SEQ ID 8560> 
40 was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 2 
SRCFLG: 0 

MCG: Length of II: 8 

Peak Value of UR: 2.23 
45 Net Charge of CR: 1 

McG: Discrim Score: 0.46 
GvH: Signal Score (-7.5): -3.54 

Possible site: 42 
»> Seems to have an uncleavable N-term signal seq 
50 Amino Acid Composition: calculated from 1 

ALOM program count: 2 value: -8.44 threshold: 0.0 

INTEGRAL Likelihood = -8.44 Transmembrane 37 - 53 ( 31 - 57) 
INTEGRAL Likelihood = -8.12 Transmembrane 75 - 91 ( 69 - 93) 
PERIPHERAL Likelihood = 2.76 200 
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Step: 3 

Final Results 

bacterial membrane --- Certainty=0 .4376 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 9 MIRFLIIAGyFELSmLKLSGKmQYIKT'HYTYiAYISMVLSFlLAIVQLlIWViattlKMH 68 

M R L+L G+ +L SG L +YIN Y YL++I++ L 1L VQ +++K+ + 

Sbjct: 1 MFRLLVLMGFTFFFYHLHASGNLTKYINMKYAYLSFIAIFLLAILTAVQAYLFIKSPEKS 60 

Query: 69 SHLHGKIA KSTSP MILVFPvLVGLLVPTOSLDSTTVSAKGYN 110 

H H + P ++ +FP++ G+ P +LDS+ V KG++ 

Sbjct: 61 GHHHDHDCGCGHDHEHDHEQNKPFYQRYLIYWFLFPLVSGIFFPIATLDSSIVKTKGFS 120 

Query: 111 FPLAAGSTGTVSQDGTRVQYLKPDTSTYFTSSAYEKEMQKELKKYKGSGTLTITTENYME 170 

FAS SQ QYL+PD S Y+ +Y+K+M++ KY +++T +++++ 

Sbjct: 121 FK-AMESGDHYSQ TQYLRPDASLYYAQDSYDKQMKQLFNKYSSKKEISLTDDDFLK 175 

Query: 171 VMELIYLYPEQFMDRQIQYTGFW-NEPKHEGYQFIFRFGIIHCIADSGWGLLTT-GNQ 228 

ME IY YP +F+ R I++ GF Y ++ F+ RFGIIHCIADSGVYG+L 

Sbjct: 176 GMETIYNYPGEFLGRTIEFHGFAYKGNAINKNQLFVLRFGIIHCIADSGVYGMLVEFPKD 235 

Query: 229 KSYPDNTWVTVRGTIKSEYNQLLCQNLPVLHIEESRQVSKANNPYVYRVF 278 

D+ W+ ++GT+ SEY Q + LPV+ + + + K ++PYVYR F 
Sbjct: 236 MDIKDDEWIHIKGTIASEYTQPFKSTLPVVKVTDWNTIKKPDDPYVYRGF 285 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1145> which encodes tl 
sequence <SEQ ID 1 146>. Analysis of this protein sequence reveals the following: 
Possible site: 60 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.33 Transmembrane 83 - 99 ( 74 - 101) 
INTEGRAL Likelihood = -6.21 Transmembrane 42 - 58 ( 39 - 62) 

Final Results 

bacterial membrane Certainty=0. 4333 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



A related sequence was also identified in GAS <SEQ ID 9115> which encodes the amino acid sequence 
<SEQ ID 91 16>. Analysis of this protein sequence reveals the following: 



Possible cleavage site: 54 
=. Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood = -8.33 Transmembrane 
INTEGRAL Likelihood = -6.21 Transmembrane 
PERIPHERAL Likelihood = 2.76 



Final Results 

bacterial membrane Certainty= 0.433 (Affirmative) < succ: 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 208/279 (74%), Positives = 244/279 (86%), Gaps = 1/279 (0%) 
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Query: 1 MFICGGNIMIRFLILAGYFELSIlfl 1 LS LUC lirHYTYLAYISMVLSFIIAIVQI.il 60 

+F CGG +MIRFLIIAGYFEL+MYL+LSGKL.+QYIN Y-fYLAYISM+LSFILA+VQL, 
Sbjct: 1 LFTCGGALMIRFLIIAGYFELTMYLQLSGKIiDC^INTOYSYIjAYISMILSFILAIiVQLYT 60 

Query: 61 WVKMKMHSHLHGKIAKSTSPMILVFPVLVGLLVPTVSLDSTTVSAKGYNFPLAAGSTGT 120 

W+KN+K+HSHL GKIA+ TSP ILVFPVL+GLLVPTV+LDSTTVSAKGY FPIAAG++ T 
Sbjct: 61 TO1KNIKVHSHLTGKIARLTSPFILVFPVLIGLLVPTVTLDSTTVSAKGYTFPLAAGASKT 12 0 

Query: 121 -VSQDGTRVQYLKPDTSTYFTSSAYEKEMQKELKKYKGSGTLTITTENYMEVMELIYLYP 179 

VS DGT +QYLKPDTS YFT SAY+KEM++EL, KYKG +TITTEHYMEVMELIYLYP 
Sbjct: 121 GVSDDGTTIQYLKPDTSLYFTKSAYQKEMRQELHKYKGKKPVTITTENYMEVMELIYLYP 180 

Query: 180 EQF^RQIQYTGFVYNEPKHEGYQFIFRFGIIHCIADSGVYGLLTTGWQKSYPDOTWTV 239 

++F+DR IQYTGFVYNEP H+ YQF+FRFGI IHCIADSGVYGLLTTGNQ SYP+NTW+TV 
Sbjct: 181 DEFLDRDIQYTGFVYNEPGHDNYQFLFRFGIIHCrADSGVYGLLTTGNQTSYPNNTWLTV 240 

Query: 240 RGTIKSEYNQLLQQNLPVLHIEESRQVSKANNPYVYRVF 278 

+G + EY++ L+Q+LPVT, + E Q + NNPYVYRVF 
Sbjct: 241 KGRLHMEYDKNLEQHLPVLQLAEVHQTKEPHNPYVYRVF 279 

SEQ ID 8560 (GBS235d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 146 (lane 14 & 15; MW 48.5kDa). It was also expressed in E.coli as a His- 
fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 146 (lane 17 & 18; MW 
23.4kDa), in Figure 150 (lane 15; MW 23kDa) and in Figure 182 (lane 5; MW 23kDa). 

GBS235d-His was purified as shown in Figure 235, lane 6-7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 354 

A DNA sequence (GBSx0385) was identified in S.agalactiae <SEQ ID 1147> which encodes the amino 
acid sequence <SEQ ID 1 148>. This protein is predicted to be signal recognition particle (ftsY). Analysis of 
this protein sequence reveals the following: 

N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 .3301 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06205 GB:AP001515 signal recognition particle (docking 
protein) [Bacillus halodurans] 
Identities = 175/304 (57%) , Positives = 227/304 (74%) 

45 Query: 233 EKyjKSLKKTRTGFSARLNAFLSNFRRVDEEFFEELEEMLILSDVGVNVATQLTEDLRYE 292 

EK+ L+KTR F+ ++N + +R VDE+FFEELEE+LI +DVGV h E+L+ E 

Sbjct: 20 EKFKAGLEKTRDSFAGKMNDLWKYRSVDEDFFESLESILIGADVGVTTVMDLVEELKDE 79 

Query: 293 AKLENAKKSEDLKRVITOKLWIYEKEGIYIilEAINFQEGLTWILFVGWGVGKTTSIGKL 352 
50 + +N K S+D++ +1 EKL E+ EK+G E GL+V+L VGVNGVGKTTSIGKL 

Sbjct: 80 VPJIQNiroSKDIQPIISEKIAELLEKEGGETEVNLQPAGLSVILWGVNGVGKTTSIGKL 139 

Query: 353 AHQYKSQGKKVMLVAADTFFAGAVAQLVEWGRRVDVPVVTGEEKADPASVVFDGMEKAVA 412 
AH YK QGKKV+L A DTFRAGA+ QL WG R V V+ E +DPA+V+FD ++ A + 
55 Sbjct: 140 AHMYKQQGKKVILAAGDTFRAGAIEQLEVWGERAGVDVIKQSEGSDPAAVMFDAIQAAKS 199 

Query: 413 QGVDVLLIDTAGRLQNKENLMAELEKIGR1IKRVVPDAPHETLLALDASTGQNALSQAKE 472 
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+ D+L+ DTAGRLQNK NLM ELEK+ R+I R +P APHE L+ALDA+TGQNA+SQAK 
Sbjct: 200 READ I LI CDTAGRLQNKVNLMKELEKVKRVI SREIPGAPHE VLI ALDATTGQNAMSQAKT 259 

Query: 473 PSKITPLTGLILTKIDGTAKGGWLAIRQELDIPVKFIGFGEKIDDIGEFNSEDFMRGLL 532 
5 P + T +TG+ 1 LTK+DGTAKGG+VLAI R ELDIPVKF+G GEKIDD+ P+SE F+ GL 

Sbjct: 260 FKETTDVTGIILTKLDGTAKGGIVLAIRHEIiDIPVKFVGLGEKIDDLQPFDSEQFVYGLF 319 

Query: 533 EGIL 536 
+ ++ 

10 Sbjct: 320 KDMV 323 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1149> which encodes the amino acid 
sequence <SEQ ID 1 150>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
15 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .4384 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

20 bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 339/549 (61%), Positives = 404/549 (72%), Gaps = 46/549 (8%) 

25 Query: 1 MGLFDRLFGHKKKDKEPEIEASESVVLEDEDSVIDKEEGSNFSKESTLNRTSEVPVAEDD 60 
MGLFDRLFG K+ K E + E+++ E KEE S + E ++ + + 

Sbjct: 1 MGLFDRLFGKKETPKVAEEKLEENLLTE TTQKEELSEKANEQ DKIEAVQQE 51 



Query: 


61 


Sbjct: 


52 


Query: 


121 


Sbjct: 


93 




179 


Sbjct: 


150 






Sbjct: 


209 




288 


Sbjct: 


268 


Query: 


348 


Sbjct: 


328 


Query: 


408 


Sbjct: 


388 




468 


Sbjct: 


448 




528 


Sbjct: 


508 



t- P + ++ L E+T 



S+++ S+ + L D 



E++ S + E SQ++ 



+LRYEAKLENAKK + LKRVI VEKLV+ 1 YEKDG YNEAIN+Q+GLTVMLFVGVNGVGKTT 



SIGKLA++YK +GKKVMLVAADTFRAGAVAQLVEWGRRVDVPV+TG EKADPASWFDGM 



EKAVA+GVD+LLIDTAGRLQNKENLMAELEK+GRI I KRV+ PDAPEETLLALDASTGQNAL 



SQAKEFSKITPLTGLILTKIDGTAKGGWLAIRQELDIPVKFIGFGEK+DDIGEF+SEDF 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 355 

A DNA sequence (GBSx0386) was identified in S.agalactiae <SEQ ID 1151> which encodes the amino 
acid sequence <SEQ ID 1152>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 35 92 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA62048 GB:L10328 f270 [Escherichia coli] 
Identities = 101/273 (36%) , Positives = 160/273 (57%) , Gaps = 10/273 (3%) 



Query: 
Sb j ct 
Query: 
Sbjct 

Sbjct 
Query: 
Sbjct 



4 IKILALDLDGTLFTTDKKVSEENKVALKAAREKGIKWITTGRPLKAIGNLLEDLELVSD 63 

IK++A+D+DGTL D +S K A+ AAR +G+ W+TTGRP + N L+4L + 
3 IKLIAID^GTLLLPDHTISPAVT^%IAfARARGVWVT^TTGRPYAGVHOTLKELHl>ffiQP 62 

64 EDySITENGGLVQQNT-GKILAKTAMTRQEVEDIHEELYQVGLPTDILSEGTVYS 1 118 

DY IT+NG LVQ+ G +A+TA++ + + + +VG L T+Y+ I 

63 GDYCITYNGALVQKAADGSTVAQTALSYDDYRXLEKLSREVGSHFHALDRTTLYTANRDI 122 

119 ANKGHHSQYHIiANPLLEFIEVDDLEQVPKDWYNKIVSVIDATYLDQSIAKLPDRLKVDY 178 

+ H + PL+ F E E++ 4- + K++ + + LDQ IA++P +K Y 
123 SYYTVHESFVATIPIiV-FCEA EKMDPNTQFLKVMMIDEPAILDQAIARIPQXVKEKY 178 

179 EMFKSRDIILEMPKGVBKAVGLELLTKHI^LDSSQVMAMGDFJysroijSMLEW 238 

+ KS LE++ K V+K G++ L LG+ ++MA+GD+ ND++M+E+AG+GVAM 
179 TVLKSAPYFLEIIjDKRVNKGTGVKSLADVlGIKPEEIMAIGDQEroiAMIEYAGVGVAMD 238 

239 NGIPEAKAIAKATTICNNDESGVAEAIGKYILS 271 

N IP K +A T +N E GVA AI KY+L+ 
23 9 NAI PSVKEVANFVT - KSNLEDGVAFAIEKYVWJ 270 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1153> which encodes the amino acid 
sequence <SEQ ID 1 154>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3502 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 180/273 (65%) , Positives = 218/273 (78%) , Gaps = 1/273 (0%) 

Query: 3 DIKIIALDLDGTLFTTDKKVSEENKVALKAAREKGIIWVITTGRPLKAIGNLLEDLELVS 62 

+ I+ILALDLDGTI1+ T+K V+4 NK AL AAREKG+KWITTGRPLKAIGNLLE+L+L+ 
Sbjct: 2 NIRIIALDLDGTLYOTEKIvTDANKKALAAAREKGVKVVITTGRPLKAIGNLLEELDLLD 61 

Query: 63 DEDYSITFNGGLVQQNTGKIIAKTAMTRQEVEDIHE3LYQVGLPTDILSEGTVYSIANK- 121 

+DYSITFNGGLVQ+NTG++L K++++ +V I + L VGLPTDI+S G VYSI +K 
Sbjct: 62 HDDYS ITFNGGLVQRNTGEVLDKSSLSFDQVCQIQQALEAVGIiPTDI I SGGDVYS I PSKD 121 

Query: 122 GHHSQYHI^PLLEFIEVDDLEQVPITOVVYNKIVSVIDATYLDQQIAKLPDRLKVDYEMF 181 
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G HSQYHLANPLL FIEV + ++PKD+ YNKIV+V D +LDQQI KL L D+E F 
Sbjct.- 122 GRHSQYHLANPLLTFIEVTSVAEIiPKDITYKfKIVTVTDPDFLDQQIIKLSPSLFEDFEAF 181 

Query: 182 KSRDIILELMPKGVHKAVGLELLTKHLGLDSSQVMAMGDEMJDLSMLEWAGLGVAMANGI 241 
5 KSRDII E+MPKG+ KA GL LIi +HLGLD+ VMAMGDEAND +MLEWAGLGVAMANG+ 

Sbjct: 182 KSOTIIFEIMPKG1DKAFGLNLLCQHLGLDARHVNMGDEA1TOFAMLEWAGLGVAMANGV 241 

Query: 242 PEAKAIAKATTI CNNDESGVAEAIGKYILSEEN 274 
AKA A A T NDESGVAEA+ +IL EE+ 
10 Sbjct: 242 SGAKADADAVTTLTNDESGVAEAVKTFILEEES 274 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 356 

15 A DNA sequence (GBSx0387) was identified in S.agalactiae <SEQ ID 1155> which encodes the amino 
acid sequence <SEQ ID 1156>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 4648 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA35556 GB:D90723 Hypothetical 30.2 kd protein in idh-deoR 
intergenic region. [Escherichia coli] 
Identities = 91/264 (34%) , Positives = 146/264 (54%) , Gaps = 4/264 (1%) 

30 Query: I 

Sbjct: < 

Query: 62 AFIAENGSAaVLFlffiLAYEQHLSREQYLDIIDHLSKSPYMENlNEYVLSGKDGAYILSDAN 121 

35 AF+AENG V + + LS++ + +++HL P + E + GK+ AY L + 

Sbjct: 64 AF\7AENGGWWSEGKDVFNGELSKDAFATWEHLLTRPEV EIIACGKNSAYTLKKYD 120 



Query: 122 PDYIEFITHYYDNLQKVSHFEDVDDI I FKVTANFTEETVRQAEEWVNQAI - PYATAVTTG 1B0 
YY L+ V +F++++DI FK N ++E + Q ++ +++AI +V TG 

40 Sbjct: 121 DAMKTVAEMYYHRLEYVDNFDNLEDIFFKFGDNLSDELIPQVQKALHEAIGDIMVSVHTG 180 



Query: 181 FKSIDIILSSVNKRNGLEHLCEQYGIRAEEVLSFC-DNINDLEMLEWSGKAIATENARPEV 240 

SID+I+ V+K NGL L + +GI EV+ FGD HD+EML +G + A ENA V 
Sbjct: 181 NGSIDLIIPGVHKANGLRQLQKLWGIDDSEWVFC-DGGNDIEMLRQAGFSFAMENAGSAV 240 



Query: 241 ! 

A G +N + V+ ++ ++ 
Sbjct: 241 VAAAKYRAGSNNREGVLDVIDKVL 264 



50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1157> which encodes the amino acid 
sequence <SEQ ID 1 158>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 3401 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



60 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 138/265 (52%) , Positives = 193/265 (72%) , Gaps = 1/265 (0%) 

Query: 1 MIKJjVATDMDGTFDDENGTYDKKRLANVDKKFECEQGIVFTAaSGRSLLSLEQLPADFRDQ 60 

MIKL+ATDMDGTPL E+GTY++++LA +L K E+GI+F +SGRSLL+++QLF F DQ 
Sbjct: 1 MIKLIATDMDGTFLAEDGTYNQEQIAALLPKIAEKGILFAVSSGRSLLAIDQLFEPFLDQ 60 

Query: 61 I^FIAENGSAftVLFlTOLAyEQHLSREQYLDIIDHLSKSPyMENNEYVLSGKDGAyiLSDA 120 

+A IAENGS + + +++EQY ++ + +P+ V SG+ AYIL A 

Sbjct: 61 IAVIAENGSWQYRGEILFADMMTKEQYTEVAKKILANPHYVETGMVFSGQKARYILKGA 120 

Query: 121 NPDYIEFITHYYDNLQKVSHFEDVD-DIIFKVTANFTEETVRQAEEWVNQAIPYATAVTT 179 

+ +YI+ HYY N++ ++ FED++ D IFKV+ NFT TV + +W+NQA+ PYATAVTT 
Sbjct: 121 SEEYIQKTKHYYAWKVINGFEDMENDAIFKVSTNFTGHTVLEGSDWLMQALPYATAVTT 180 

Query: 180 GFKSIDIILSSVNKRWGLEHLCEQYGIRAEEVLSFGDKINDLEMLEWSGKAIATENARPE 239 

GF SIDIIL VNK G+EHLC+ GI+ E ++FGDN ND +MLE++G+AIATENARPE 
Sbjct: 181 GFDSIDIILKEVNKGFGMEHLCQALGIKXAETIAFGDNFMDYQMLEFAGRAIATEIIARPE 240 

Query: 240 VKEIADCIIGHHNNQAVMAYLESMV 264 

+K I+D +IGH N+ AV+ YL+ +V 
Sbjct: 241 I KVI SDQVIGHCNDGAVLTYLKGLV 265 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 357 

A DNA sequence (GBSx0388) was identified in S.agalactiae <SEQ ID 1159> which encodes the amino 
acid sequence <SEQ ID 1 160>. Analysis of this protein sequence reveals the following: 

) N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2428 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 358 

A DNA sequence (GBSx0389) was identified in S.agalactiae <SEQ ID 1161> which encodes the amino 
acid sequence <SEQ ID 1162>. This protein is predicted to be pi 15 protein (smc). Analysis of this protein 
sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 1092 -1108 (1088 -1110) 

Final Results 

bacterial membrane Certainty=0. 2996 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9713> which encodes amino acid sequence <SEQ ID 9714> 
was also identified. 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13467 GB : Z99112 chromosome segregation SMC protein homolg 
[Bacillus subtilis] 

Identities = 458/1193 (38%) , Positives = 728/1193 (60%) , Gaps = 27/1193 (2%) 





1 


Sbjct: 


1 


Query: 


61 


Sbjct : 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


421 


Sbjct: 


420 


Query: 


480 


Sbjct: 


480 


Query: 


540 


Sbjct: 


540 


Query: 


600 


Sbjct: 


600 




660 


Sb j ct : 


660 


Query: 


718 


Sbjct: 


720 


Query: 


776 


Sb j ct : 


780 




833 
840 


Sb j ct : 
Query: 


889 



M D+IFAG+++RK LN A+V++TLDN DHF+ EV V RR++R+G+SE+LI+ 4 



L+DI DLFMD+GLG+++FSIISQG+VE I 4-SK E+RR+1FEEAAGVLKYKTRKK+ ++K 



L +TQ NL+R+EDI++EL+ QV+PL+ QASIAK +L +E 4 



+T++EK+ ++E 



h +Y+ +S L+ 



++ +++ LE++ 4 S FY GVK VL+AK++LGGI GAV E +S 4 



3 S+QH++ +DE +A+++I +LK4N GRATFLPL+ 1+ K 



EL+++ K L 4 



EITLSEIKRDISNLQTLLSHQDSQLDKSELPRIEKQLLQVKNRRENDEEKLVSIiRF 888 

E+ L E K D+S L + +S S E++L + 4 ND+ K 4 L 

LTETEIiALKEAKEDLSFLTSEMSSSTSG EEKLEEAaKHKIiNDKrKTIELIA 890 
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Sbjct: 891 LRRDQRIKLQHGLDTYERELKEMKRLYKQKTTLLKDEEVKLGRMEVELDNLLQYLREEYS 950 

Query: 945 MTLDEAKVKANVLEDILM2U?EQLKSLQAKIKMX3PVNIDAIAQFEEVHERLTFLNTQRDD 1004 

++ + AK K + D AR++4-K ++ 1+ LG VN-h +1 +FE V+ER FL+ Q++D 
Sbjct: 951 LSFEGAKEKYQLETDPEEARKRVKLIKLAIEELGTVNLGSIDEFERVNERYKFLSEQKED 1010 

Query: 1005 LVHAKWLLLETITDMDDEVKTRFKSTFEAIRHSFKETFVQMFGGGSADLILTE-GDLLSA 10S3 

L AKN L + I +MD+E+ RF TF IR F + F +FGGG A+L LT+ DLL +■ 
Sbjct: 1011 LTEAKNTLFQVIEEMDEEMTKRFNDTFVQIRSHFDQVFRSLFGGGRAELRLTDPNDLLHS 1070 

Query: 1064 GVD I S VQPPGKKIQSLNLMSGGEKALSAIALLFAI IRVKTT PFVILDEVEAALDEAWKR 1123 

GV+I QPPGKK+Q+LNL+SGGE+AL+A+ALLF+I++V+ +PF +LDEVEAALDEANV R 
Sbjct: 1071 GWIIAQPPGKKXjQNLNLLSGGERALTAIALLFSILKVRPVPFCVLDEVEAALDEANVFR 1130 

Query: 1124 FGDYLNRFDKSSQFIWTHRKGTMSAADSIYGVTMQESGVSKIVSVKLKEAQE 1176 

F YL ++ +QFIV+THRKGTM AD +YGVTMQESGVSK++SVKL+E +E 
Sbjct: 1131 FAQYLKKYSSDTQFIVITHRKGTMEEADVLYGVTMQESGVSKVISVKLEETKE 1183 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1163> which encodes the amino acid 
sequence <SEQ ID 1 164>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

i N-terminal signal sequence 

Transmembrane 1092 -1108 (1088 -1110) 

Final Results 

bacterial membrane — - Certainty=0 . 2996 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



7 GB:Z99112 chromosome segregation SMC protein homolg 
[Bacillus subtilis] 

Identities = 441/1192 (36%) , Positives = 729/1192 (60%) , Gaps = 25/1192 (2%) 



L+DI DLFMD+GLG+++FSIISQG+VEEI +SK E+RR+ 1 FEEAAGVLKYKTRKK+ ^ 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 






354 


Sbjct: 


354 




414 



j +TQDNL+R+EDI++EL+ Q+ E 



E+K A ++ EQL+E + 



L+ +A++ N+L 



h ++ +R Y+ N+ 
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Sbjct: 414 QRLADNNEKHLQERHDISARKAACETEFARIEQEIHSQVGAVRDMQTKYEQKKRQYEKNE 473 

Query: 474 ERLFDLLDQKGKEARKASLESIQKSHSQFYAGVRAVLQSQKKLGGIIGAVSEHLSFDSD 533 

L+ + ++K LE++Q S FY GV+ VL+++++LGGI GAV E +S + 

Sbjct: 474 SRLYQAYQYVQQ2UISKKDMLETMQGDFSGFYQGVKEVLKAKERLGGIRGAVLELISTEQK 533 

Query: 534 YQTALEVALGANSQHIIVTDEAAAKRAIAYLKKHRQGRATFLPLTTIKARSLSEHYHRQL 593 

Y+TA+E+ALGA++QH++ DE +A++A.I YLK+N GEATFLPL+ ItKl 
Sbjct: 534 YETAIEIALGASAQHVVTDDEQSARKAIQYLKQNSFGRATFLPLSVIRDRQLQSRDAETA 593 

Query: 594 ATCEGYLGTAESLIRYDDSLSAIIQNLLSSTA1FETIDQANIAARLLGYKVRIVTLDGTE 653 

A +LG A L+ +D + ++IQNLL + I E + AH A+LLG4+ RIVTL+G 
Sbjct: 594 ARHSSFLGVASELVTFDPAYRSVIQNLLGTVLITEDLKGANELAKLLGHRYRIVTLEGDV 653 

Query: 654 LRPGGSFSGGANRQSNTTFI- -KPELEQISEELTRLVEQLKITEKEVAALQSDLIAKKEE 711 

+ PGGS +GGA ++ N + + ELE +++ L + E+ + E+EV L+ + +++ 
Sbjct: 654 WPGGSMTGGAVKKKNNSLLGRSRELEDVTKRIAEMEEKTALLEQEVKTLKHSIQDMEKK 713 

Query: 712 LTQLKLAGDQARLAEQ--RAQMAYQQLQEKQEDSKALI1AALDQSQTTHSDESLLAEQARI 769 

L L+ G+ RL +Q + Q+ Q+ EK ++ L ++S + SDE + ++ 

Sbjct: 714 LADLRETGEGLRLKQQDVKGQLYELQVAEKlvIINTHLELYDQEKSALSESDEERKVRKRKL 773 

Query: 770 

EE L+A+++K L DID 

Sbjct: 774 

Query: 830 SRLRTQLKQCQQNILKLESILNNNVSQDSIQRLPQWQKQLQDATEHKSGAQKRLVQLRFE E 



Query: 890 IEDYE7ARLEETAEKI TKESEKNDT F I RRQTKL ETHLEQVANRLRAYAKSLSEDFQM 945 

D +L+ + +E ++ +++T L E L ++ L + L E++ + 

Sbjct: 892 RRDQRIKLQHGLDTYERELKEMKRLYKQKTTLLKDEEVKLGRMEVELDNLLQYLREEYSL 951 

Query: 946 TLADAKEVTNSIDHLESAKEKLHHLQKTIRALGPINSDAINQYEETOERLTFLTSQKTDL 1005 

+ AKE E A++++ ++ I LG +N +I+++E V+ER FL+ QK DL 

Sbjct: 952 SFEGaKEKYQLETDPEEJU^KRVKLIKLAIEELG'TO.ILGSIDEFERVMERYKFLSEQKEDL 1011 

Query: 1006 TKAKNLLLETINSMDSEVKARFKVTFEAIQKSFKETFTQMFGGGSADLVLTE-TDLLSAG 1064 

T+AKN L + I MD E+ RF TF 1+ F + F +FGGG A+L LT+ DLL +G 
Sbjct: 1012 TFJUOWLFQVIEEMDEEMTKRFNDTFVQIRSHFDQVFRSLFGGGRAELRLTDPNDLLHSG 1071 

Query: 1065 IEISVQPPGKKIQSLNLMSGGEKALSALAIiFAIIRWTIPFVILDEVEAALDFJWVKRF 1124 

+EI QPPGKK+Q+LNL+SGGE+AL+A+ALLF+I++V+ +PF +LDEVEAALDEANV RF 
Sbjct: 1072 VEIIAQPPGKKLQNIiNLLSGGEFALTAIALLFSILKVRPVPFCVLDEVEAALDEANVFRF 1131 

Query: 1125 GDFLNRFDKDSQFIWTHRKGTMAAADSIYGITMQESGVSKIVSVKLKEAQE 1176 

+L ++ D+QFIV+THRKGTM AD +YG+TMQESGVSK++SVKL+E +E 
Sbjct: 1132 AQYLKKYSSDTQFIVITHRKGTMEEADVLYGVTI1QESGVSKVISVKLEETKE 1183 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 732/1179 (62%) , Positives = 911/1179 (77%) 
Sbjct: 1 

Query: 61 MPDVIFAGTEHRKPLNYAQVSVTLDNSDHFIEKIADEVRVERRIFRNGDSEYLIDGRKVR 120 

MPDVI FAGT+NR PLNYA+V+V LDNSDHFI+ E+RVER I+RNGDS+YLIDGRKVR 
Sbjct: 61 MPDVIFAGTQNRNPL^AKVAVVLDNSDHFIKTAKKEIRVEP^IYRNGDSDYLIDGRKVR 120 

Query: 121 LRDIHDLFMDTGLGRDS FS 1 1 SQGRVEAI FNS KPEERRAI FEEAAGVLKYKTRKKETQSK 180 

LRDIHDLFMDTGLGRDSFSI ISQGRVE I FNS KPEERRAI FEEAAGVLKYKTRKKETQ K 
Sbjct: 121 LRDIHDLFMDTGLGRDSFSIISQGRVEEIFNSKPEERRAIFEEAAGVLKYKTRKKETQIK 180 

Query: 181 LEQTQGNLDRLEDI IYELDMQVQPLEKQASIAKRFLVLDEERQGLHLS ILIEDILQHQSD 240 
L QTQ NLDRLEDI IYELD Q+ PLEKQA +AK+FL LD R+ L L IL++DI Q 
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Sbjct: 181 LNQTQDNLDRLEDIIYEBDTQMPLEKQAKVAKQFLELDANRKQLQLDILVKDIDIAQER 240 

Query: 241 LTTVEEKLLTWKELaTYYQQRQSLEDEKQSLKQKRHHLSEEIEAKQLALLDVTKLKSDL 300 

T EL ++++LA+YY +RQS+E++ Q KQK+ LS+E + Q LL++TKL +DL 
Sbjct: 241 QTKDTEMiRALQQDIASYYAKRQSMEEDYQKFKQKKQVLSQESDQTQTTLLELTKLIADL 300 

Query: 301 ERQIDLIRLESNQKAEKKEEAGQRLAELEIKAKDCSDQITQKNIELTTLSEKIAQIRSEI 360 

E+QI+L++LES Q+AEKK EA + L +L+ + + Q +L 4- +++ ++ ++ 

Sbjct: 301 EKQIELVKLESGQEAEKKAEAKKHLEQLQEQLDGFQAEEKQCTEQLLHIDQQLCDVKQQL 360 



Sbjct: 361 
Query: 421 

Q + L+ L ++ A ++A K+ V LL +YQ+ + +Q LE +Y+ Q LFD L 
Sbjct: 421 QLLVTKLDQLMDESQKAQAHYKAQKEQVEMLLQWYQEGDKRVQELERDYQLNQERLFDLL 480 

Query: 481 DEIKSKQARISSLESILKNHSNFYAGVKSVLQAKDQLGGIIGAVSEHLSFDKHYQTALE1 540 

D+ K K+AR +SLESI K+HS FYAGV++VLQ++ +LGGIIGAVSEHLSFD YQTALE+ 
Sbjct: 481 DQKKGKEARKASLESIQKSHSQFYAGVRAVLQSQKKLGGIIGAVSEHLSFDSDYQTALEV 540 

Query: 541 ALGGSSQHIIVEDESAAKRSIAFLKKNRQGRATFLPLTTIKPRELAQHYLSKLQSSQGFL 600 

ALG +SQHIIV DE+AAKR+IA+LKKNRQGRATFLPLTTIK R L++HY +L + +G+L 
Sbjct: 541 ALGANSQHIIVTDEAAAKRAIAYLKKNRQGRATFLPLTTIKARSLSEHYHRQLATCEGYL 600 

Query: 601 GIASELVTYDQRLSNIFKtmLGLTAIFDTVDNANVAARQLNYQVRLVTLDGTELRPGGSY 660 

G A L+ YD LS I +N L TAIF+T+D AN+AAR L Y+VR+VTLDGTELRPGGS+ 
Sbjct: 601 GTAESLIRYDDSLSAIIQNLLSSTAIFETIDQAK1IAARLLGYKVRIVTLDGTELRPGGSF 660 

Query: 661 SGGANRQNNTVFIKPELDNLKKELKQAQSKQLIQEKEVATLLEQLKEKQETIAQLKNDGE 720 

SGGANRQ+NT FIKPEL+ + +EL + +1 EKEVA L L K+E L QLK G+ 
Sbjct: 661 SGGANRQSfJTTFIKPELEQISEELTRLVEQLKITEKEVAALQSDLIAKKEELTQLKIAGD 720 

Query: 721 QARLEEQRADIEYQQLSEKLADMKLYNGLQLSSGALEQTTSENEKNRLEKELEQFAIKK 780 

QARL EQRA + YQQL EK D L L S + E+ R+E+ L A KK 

Sbjct: 721 QARLAEQRAQMAYQQIiQEKQEDSKALLAALDQSQTTHSDESLLAEQARIEEALTAIAKKK 780 

Query: 781 EELTTSIAQIKEDKDSIQEKVKNLTTLLSEAQLEERDLLriEQKFERAWCTRLEITLSEIK 840 

LT I IKE+KD I++K N+ LS+A+L+ERDLLNE+KFE+AN +RL L + + 
Sbjct: 781 NALTCDIDDIKENKDLIRQKTQNIHQALSQARLQERDLLNEKKFEQANQSRLRTQLKQCQ 840 

, Query: 841 RDlSNLQTLLSHQDSQLDKEEIiPRIEKQLLQVNNRREHDEEKLVSLRFELEDCEAALDDL 900 
++I L+++L++ SQ + LP+ +KQL + +++LV LRFE+ED EA L++ 

Sbjct: 841 QNILKLESILNNNVSQDSIQRLPQWQKQLQDATEHKSGAQKRLVQLRFEIEDYEARLEET 900 

Query: 901 AASIAKEGQKNESLIRQQAQLESQCEQLSQQLMIFSRQLSEDYQMTLDEAKVKANVLED1 960 



Query: 961 LMAREQLKSLQAKIKALGPWIDAIAQFEEVHSRLTFLNTQRDDLVHAKNLLLETITDMD 1020 

A+E+L LQ I+ALGP+N DAI Q+EEVHERLTFIi +Q+ DL AKNLLLETI MD 
Sbjct: 961 ESAKEKLHHLQKTIRALGPINSDAINQYEEVHERLTFLTSQKTDLTICAKNLLLETINSMD 1020 

Query: 1021 DEVKIRFKSTFEAIRHSFKETFVQMFGGGSADLI LTEGDLLSAGVDI SVQPPGKKIQSLN 1080 

EVK RFK TFEAI+ SFKETF QMFGGGSADL+LTE DLLSAG++ISVQPPGKKIQSLN 
Sbjct: 1021 SEVKARFKVTFEAIQKSFKETFTQMFGGGSADLV1TETDLLSAGIEISVQPPGKKIQSLN 1080 

Query: 1081 LMSGGEKALSAIjALLFAIIRWTIPFVILDEVEAALDKA3WKRFGDYLNRFDKSSQFIVV' 1140 

LMSGGEKALSALALLFAIIRVKTIPFVILDEViaUVLDEANVKRFGD+LNRFDK SQFIW 
Sbjct: 1081 LMSGGEKALSALALLFAI 1RVKT I PFVI LDEVEAALDEANVKRFGDFLNRFDKDSQFIVV 1140 



Query: 1141 THRKGTMSAADSIYGTCMQESGVSKIVSVKLKEAQEMTN 1179 

THRKGTM+AADSIYG+TMQESGVSKIVSVKLKEAQEMTO 
Sbjct: 1141 THRKGTMAAADSIYGITMQESGVSKIVSVKLKEAQEMTN 1179 
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SEQ ID 1 162 (GBS199) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 2; MW 75kDa). 

GBS199-GST was purified as shown in Figure 208, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 359 

A DNA sequence (GBSx0390) was identified in S.agalactiae <SEQ ID 1165> which encodes the amino 
acid sequence <SEQ ID 11 66>. This protein is predicted to be ribonuclease III (rnc). Analysis of this 
protein sequence reveals the following: 

10 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3372 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 971 1> which encodes amino acid sequence <SEQ ID 9712> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13466 GB:Z99112 ribonuclease III [Bacillus subtilis] 
Identities = 115/230 (50%) , Positives = 154/230 (66%) , Gaps = 1/230 (0%) 





13 


KKMKELRSKLEKDYGlVFANQELLDTAFTHTSYANEHRLnNISHNERLSFLGDAVLOLLI 


72 






KK+++ + E+ + F N++LL AFTH+SY NEHR NERLEFLGDAVL+L I 




Sbjct: 


15 


KKVEQFKEFQER-ISVHFQNEKLLYQAFTHSSYVNEHRKKPYEDNERLEFLGDAVLELTI 






73 


SQYLFTKYPQKAEGDLSKLRSMIVREESLAGFSRLCGFDHYIKLGKGEEKSGGRNRDTIL 


132 






S++LF KYP +EGDL+KLR+ IV E SL + F + LGKGEE +GGR R +L 




Sbjct: 


74 


SRFLFAKYPAMSEGDLTKLRAAIVCEPSLVSLAHELSFGDLVLLGKGEEMTGGRKRPALL 


133 




133 


GDLFEAFLGALLLDKGVEWHAFVNKVMIPHVEKGTYERVKDYKTSLQELLQSHGDVKID 


192 






D+FEAF+GAL LD+G+E V +F+ + P + G + V D+K+ LQE +Q G ++ 




Sbjct: 


134 


ADVFEAFIGALYLDQGLEPVESFLKVYVFPKINDGAFSHVMDFKSQLQEYVQRDGKGSLE 193 






YQVTNESGPAHAKEFEVTVSWQENLSQGIGRSKKAAEQDAAKNALATLQ 242 








Y+++NE GPAH +EFE VS+ EL G GRSKK AEQ AA+ ALA LQ 




Sbjct: 


194 


YKISWEKGPAHITOEFEAIVSLKGEPLGVGNGRSKKEAEQKAAQEALAKLQ 243 





40 A related DNA sequence was identified in S.pyogenes <SEQ ID 1167> which encodes the amino acid 
sequence <SEQ ID 1168>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 1414 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 170/227 (74%) , Positives = 192/227 (83%) 
Query: 15 MKELRSKLEKDYGIVFANQELLDTAFTHTSYANEHRLLNISHNERLEFLGDAVLQLLISQ 74 
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Query: 75 YLFTKYPQKAEGDLSKXiRSMIVREESLAGFSHLCGPDHYIKLGKGEEKSGGRNRDTILGD 134 

YLF KYP+K EGD+SKLRSMIVREESLAGFSR C FD YIKLGKGEEKSGGR RDTILGD 
Sbjct: 61 YLFAICYPKKTEGDMSKLRSMIVREESLAGFSRFCSFDAYIKLGKGEEKEGGRRRDTILGD 120 

Query: 135 LFEAFLGALLLDKGVEVVHAFVNIWMIPHVEKGTYERVKDYKTSLQELLQSHGDVKIDYQ 194 
LFEAFLGALLLDKG++ V F+ 4VMIP VEKG +ERVKDYKT LQE LQ+ GDV. IDYQ 

Query: 195 VTNESGPAHAKEFEVTVSVNQENLSQGIGRSKKAAEQDAAKNALATL 241 

V +E GPAHAK+ FEV+ + VN LS+G+G+SKK AEQ3AAKNALA L 
Sbjct: 181 VI SEKGPAHAKQFEVS I WNGAVLSKGLGKS KKLAEQDAAKNALAQL 227 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 360 

A DNA sequence (GBSx0391) was identified in S.agalactiae <SEQ ID 1169> which encodes the amino 
acid sequence <SEQ ID 1170>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.19 Transmembrane 100 - 116 ( 99 - 117) 
INTEGRAL Likelihood = -2.44 Transmembrane 81 - 97 ( 81 - 97) 

: Final Results 

bacterial membrane — Certainty=0. 2678 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC12789 GB:AJ279090 hypothetical protein [Staphylococcus 
carnosus] 

Identities = 50/114 (43%) , Positives = 72/114 (62%) 

Query: 3 KIFYISLGFISLGIGIAGIVLPWPTTPLVLLSAFCFSRSSEKFDIWLRQTKVYICYYAAD 52 

K ++LG I GIG GIV+P++PTTP +LL+A CFSRSS+KF4 WL TK++ Y 
Sbjct: 2 KYVLMTLGLIFAGIGFVGIWPLLPTTPFLLLAAICFSRSSKKFNRWLVNTKIHDEYVES 61 



Query: 63 FVESRSIAPARKKSMIWQIYILMGIS] 

F + +K ++ +YILMGISI+ +++++ LLI V T VLF V 

Sbjct: 62 FKRDKGFTLKKKFKLLTSLYILMGIS I FI IBNLYIRITLLIMLFVQTWLFTFV 115 

No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 



Example 361 

A DNA sequence (GBSx0392) was identified in S.agalactiae <SEQ ID 1171> which encodes the amino 
acid sequence <SEQ ID 1 172>. Analysis of this protein sequence reveals the following: 

} N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 1908 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogems <SEQ ID 1173> which encodes the amino acid 
sequence <SEQ ID 1 174>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1610 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 225/269 (83%) , Positives = 248/269 (91%) 

Query: 1 MSEIGFKYSILASGSTGNCFYlETPQKRLIjIDAGLTGKKVTSLtAEINRKPEDLDAILVT 60 

M+E GFKYSIIASGSTGNCFY+ETP4KRLLIDAGLTGKK+TSLLAEI+RKPEDLDAIL+T 
Sbjct: 1 MNESGFKYSIIASGSTGNCFYLETPKKRLLIDAGLTGKKITSI.LAEIDRKPEDLDAILIT 60 

Query: 61 HEHSDHIKGVGVLARKyHLDIYANEQTWKVMDER^lLGKVDVSQKHVFGRGKTLTFGDLD 120 

HEHSDHIKGVGV+ARKYHLDIYANE+TW++MDE NMLGK+D SQKH+F R K LTFGD+D 
Sbjct: 61 HEHSDHIKGVGVMARK^HLDIYANEKTWQLMDEC»1LGKI,DASQKHIFQRDKVLTFGDVD 120 

Query: 121 IESFGVSHDAVDPQFYRMMKDDKSFVMLTDTGYVSDRMAGLIENADGYLIESNHDIEILR 180 
I 

Sbjct: 121 I 
Query: 181 £ 

SGSYPW+LKQRILSD GHLSNEDG+ MIR++G TK IYLGHLSKENNIKELAHMTM N 
Sbjct: 181 SGSYPWSLKQRILSDLGHLSNEDGAGAMIRSLGYNTKKIYLGHLSKENNIKELAHMTMVN 240 

Query: 241 NLMRADFGVGTDFSVHDTSPDSATPLTRI 269 

L AD VGTDF+VHDTSPD+A PLT I 
Sbjct: 241 QLAMADLAVGTDFTVHDTSPDTACPLTDI 269 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 362 

A DNA sequence (GBSx0393) was identified in S.agalactiae <SEQ ID 1175> which encodes the amino 
acid sequence <SEQ ID 1 176>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.94 Transmembrane 15 - 31 ( 5 - 34) 

Final Results 

bacterial membrane --- Certainty=0 . 5776 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogems <SEQ ID 1177> which encodes the amino acid 
sequence <SEQ ID 1 178>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 335/443 (75%) , Positives = 392/443 (87%) 

Query: 7 NIRSFELALLFLLVFVAVYFVYIAVRDFKMSKNIRLIiNWKVRDLIAGNYSDSILIQGDAD 66 

N+ +FELA+L LLVFVA YF++LAVRD++ ++ IR+++ K+RDLI G Y+D I + D + 
Sbjct: 8 NLSTFEIAILILLVFVABYFIHIAWDYRNARIIRMMSHKIRDLINGRYTDIIDEKADIE 67 

Query: 67 LVELGESLNDLSDVFRMRHDNLEQEKNRIiAS ILTYMTDGVLATDRSGKIVMINETRQQQP 126 

L4-EL + LNDLSDVFR+ H+NL QEKNRLASIL YM+DGVLATDRSGKI +MINETA+ +Q 
Sbjct: 68 L^LSDQIOTLSDVFRLTHENIAQEKlffilASIIAYMSDGVLATDRSGKIIMINETARKQL 127 

Query: 127 l^YDEALSmiVDMLGSGSPYSFQDLVSKTPEVVLNRRDENGEFVTLRlRFALNRRESG 186 

NL+ +EAL NI D+L + Y+++DLVSKTP V +N R++ GEFV+LR+RFALNRRESG 
Sbjct: 128 NLSKEE1ALKKNITDLLEGDTSYTYRDLVSKTPVVTTOSRKTDMGEFVSLRLRFALNRRESG 187 

Query: 187 FISGLVAVSHDATEQEKEERERRLFVSNV3EELRTPLTSVKSYLEALDEGALNEEVAPSF 246 

FISGLV V HD TEQEKEERERRLFVSNVSHELRTPLTSVKSYLEALDEGAL E++APSF 
Sbjct: 188 FISGLVWLHDTTEQEKEERERRLFVSNVSEELRTPLTSVICSYLEALDEGALKEDIAPSF 247 

Query: 247 IKVSLDETNPJffl^ISDLLSLSRIDNEVTHLDVEMTNFTAFMTSII^FDQIRNQKTVTG 306 

IKVSLDETNRMMRMISDLL+LSRIDN+VT L VEMTNFTAF+TS I LNRFD ++NQ T TG 
Sbjct: 248 IKVSLDETITOIMRMISDLLNLSRIDNQOT^ 307 

Query: 307 KVYEIVRDYPLI<SIWVEIDTDKMTQVIDNIIJ^VKYSPDGGKITVNLRTTICTQMILSIS 366 

KVYE I VRDYP+ S+W+EID DKMTQVI+NILKNA+KYSPDGGKITV ++TT TQ+l+SIS 
Sbjct: 308 KWYEIvPJ3YPITSWIEIDM)KMTQVIENILNNAIKYSPDGGKITVRMKTTDTQLIISIS 367 

Query: 367 DQGLGI PKKDLPL I FDRFYR VDKARSRKQGGTGLGLS I AKE I VKQHKGPI WAKSEYGKGS 426 

DQGLGIPK DLPLI FDRFYRVDKARSR QGGTGLGL+IAKEI+KQH GFIWAKS+YGKGS 
Sbjct: 36S DQGLGIPKTDLPLIFDRFYRVDKARSRAQGGTGLGLAIAKEIIKQHHGFIWAKSDYGKGS 427 

Query: 427 TFTIVLPYDKDAVTYEEWEDVED 449 

TFTIVLPY+KDA YEEWE+ D 
Sbjct: 423 TFTIVLPYEKDAAIYEEWEEDVD 450 

A related GBS gene <SEQ ID 8561> and protein <SEQ ID 8562> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 8.59 
GvH: Signal Score (-7.5): -3.38 

Possible site: 26 
>» Seems to have an uncleavable N-terra signal seq 
ALOM program count: 1 value: -11.94 threshold: 0.0 

INTEGRAL Likelihood =-11.94 Transmembrane 15 - 31 ( 5 - 34) 
PERIPHERAL Likelihood = 8.27 178 
modified ALOM score: 2.89 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5776 (Affirmative) < suoo 

bacterial outside Certainty=0. 0000 (Not Clear) < snco 

bacterial cytoplasm Certainty=0 . 0000 (Wot Clear) < suco 

The protein has homology with the following sequences in the databases: 

67.5/83.5% over 439aa 

Streptococcus pneumoniae 
GP | 5830524 | histidine kinase Insert characterized 

ORF01458(331 - 1647 of 1947) 

GP|5830524|emb|CAB54569.l| |AJ006392(10 - 449 of 449) histidine kinase {Streptococcus 

pneumoniae} 

%Match =45.6 
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%Identity =67.5 %Simllarity =83.4 

Matches = 297 Mismatches = 70 Conservative Sui.s = 70 

126 156 186 216 246 276 306 335 

5 ITSPFSDTyRTSHDTRTFIGNSLGI*LFWRCPYS*CDGETFT*KE'*RYSWSSRIYFDSTWCRXIT*SLMNNSAANIRSPE 

I 

MLDLLKQTIFT 
10 

10 366 396 426 450 480 510 540 570 

I^LFLLVFVAWFVYLAVRDFKMSKNIRL--I^KV^ 

:s|:|:s: :| : ||:| :| Ihllllhll : :|| ::: : :|||||:| |: ::|||| 

RDFIFILlLLGFILWTLLLLENRRDNIQLKQWQKVTOLIAGDYSKmDKQGGSEITNITNNIMJLSEVIRLTQENLEQ 
30 40 50 60 70 80 90 

15 

600 630 660 690 720 750 780 810 

EKNRLASILTYMTDGVIATDRSGKIWUNETAQQQFNIAYDE 

I II III 111111111 = 1 hUllhll::!: I =: h =| = ::| I : : | | : : : : | | : : | : :| 111 
ESKRLNSILFYMTDGVLAT^RGQIIMINDTAKKQLGLVKEDvXMSILELLKIEENYELRDLITQEPELLLDSQDINGE 

20 110 120 130 140 150 160 170 

840 870 900 930 960 990 1020 1050 

FVTLRIRFAIjNRRESGFISGLvAVSHDATEQEKEERERRLFVSWSHELRTPIiTSVKSYLEALDEGALNEEVAPSFIKVS 

= = ii = imi miiiimm ii iiiiiiiiiiiiiiiiiiiiiiMiiiiiiiiiiiiiiii i in inn 

25 YliNI^WFALIRRESGFISGLVAVLHDTTEQEKEERERRLFVSOTSffi^ 

190 200 210 220 230 240 250 

1080 1110 1140 1170 1200 1230 1260 1290 

LDETNRMMRMISDLLSLSRIDNEVTHLDVEMTOT^ 
30 ||||||l|||::||| llllll =11111= llll|:| I I I I I I = = : I I 11 = 11111: 111 = 11111111 

LDETNRMMRIWTDLIiHLSRIDNATSHLDVELINFTAFITFIIiNRFDKy.KGQ- -EKEKKYELVRDYPINSIWMEIDTDKMT 
270 280 290 300 310 320 

1320 1350 1380 1410 1440 1470 1500 1530 

35 QVIDNILNNAVKYSPDGGKITVNLRTTKTQMILSISDQGLGIPKKDLPLIFDRFYRVDKARSRKQGGTGLGLSIAKEIVK 

imiiiiiimmiimi = = n= iniiiimimimi mmm = im 11111111111111 = 1 

QWDNILNNAIKY'SPDGGKITVRMKTTEDQMILSISDHGLGIPKQDLPRIFDRFYRVDRARSRAQGGTGLGLSIAKEIIK 
340 350 360 370 380 390 400 

40 1560 1590 1620 1647 1677 1707 1737 1757 

QHKGFIWAKSEYGKGSTFTIVLPYDKDAVTYEEWED-VED*HMSEIGFKYSILASGSTGNCFYIETPQKRLLIDAGI)TGK 

I I urn i inn urn ii i urn iii i m m 

QHKGF IWAKSEYGKGSTFTI VIiPYDKDAVKEFAWEDEVED 
420 430 440 

45 

SEQ ID 1176 (GBS41) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 4 (lane 7; MW 50kDa), in Figure 168 (lane 2-4; MW 65kDa - thioredoxin fusion) 
and in Figure 238 (lane 4; MW 65kDa). It was also expressed in E.coli as a GST-fusion product. SDS- 
PAGE analysis of total cell extract is shown in Figure 13 (lane 7; MW 75kDa). 

50 Purified Thio-GBS41-His is shown in Figure 244, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 363 

A DNA sequence (GBSx0394) was identified in S.agalactiae <SEQ ID 1179> which encodes the amino 
55 acid sequence <SEQ ID 1180>. This protein is predicted to he VicR protein (regX3). Analysis of this 
protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 2754 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < auco 
5 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1181> which encodes the amino a 
sequence <SEQ ID 1182>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2754 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 205/236 (86%) , Positives = 221/236 (92%) 

20 Query: 1 MKKILIVDDEKPISDIIKFNLTKEGYETATAFDGREALVQYAEFQPDLIILDLMLPELDG 60 

MKKILIVDDEKPISDIIKFNLTKEGY+ TAFDGREA+ + E +PDLI ILDLMLPELDG 
Sbjct: 1 MKKILIVDDEKPISDIIKFNLTKEGYDIVTAFDGREAVTIFEEEKPDLIILDLMLPELDG 60 



Query: 61 LEVAKEVRKTSHIPIIMLSAKDSEFDKVIGLEIGADDYVTKPFSNRELLARVKAHLRRTE 120 

LEVAKE+RKTSH+PIIMLSAKDSEFDKVrGLEIGADDYVTKPFSNRELLARVKAHLRRTE 
Sbjct: 61 LEVAKEIRKTSHVPIIMLSAKDSEFDKVIGLEIGADDYVTKPFSNRELLARVKAHLRRTE 120 

Query: 121 NIETAVAEESAQNASSDITIGELQILPDAFIAKKRGEEIELTHREFELLHHLATHIGQVM 180 

IETAVAEE+A + + ++TIG LQILPDAF+AKK G+E+ELTHREFELLHHLA H+GQVM 
Sbjct: 121 TIETAVAEENASSGTQELTIGlttQILPDAFVAKimGQEVELTHREFELLHHLANHMGQvM 180 

Query: 181 TREHLLETVWGYDYFGDVRTVDVTVRRLREKIEDTPGRPEYILTRRGVGYYMKSYE 235 

TREHLLE VWGYDYFGDVRTVDVTVRRLREKIEDTP RFEYILTRRGVGYYMKSY+ 
Sbjct: 181 TREHLLEIWGYDYFGDWTVDWVRRLREKIEDTPSRPEYILTRRGVGYYMKSYD 236 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 



Example 364 

A DNA sequence (GBSx0395) was identified in S.agalactiae <SEQ ID 3183> which encodes the amino 
40 acid sequence <SEQ ID 1184>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>=■> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 3791 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14701 GB:Z99118 glutamine ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 149/244 (61%) , Positives = 200/244 (81%) , Gaps = 2/244 (0%) 

55 Query: 3 LISYKWVNKYYGDYHALRQINLEIEPGQVVOTjLGPSGSGKSTLIRTMNALESIDDGSLW 62 

+I+++NVNK+YGD+H L+QINL+IE G+VW++GPSGSGKSTL+R +N LESI++G L V 
Sbjct: 1 MITFQNVNKHYGDFHVLKQINLQIEKGEVWIIGPSGSGKSTIjLRCINRLESINEGVLTV 60 
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Query: 63 NGHELANISSKELV1&RKEVGIWFQHFNLYPHKTVLENITLAPIKVLKQSKKEAMEIAEK 122 

NG + N ++ +R+ +GMVFQHF+LYPHKTVL+NI LAP+KVL+QS ++A E A 
Sbjct: 61 NGTAI-NDRKTDINQTOQNIGMVFQHFHLYPHKTVLQ^IIMLAPVKVLRQSPEQAKETARY 119 

Sbjct: 

Query: 183 Qia^MTOGMNMVWrHEMGFAREWOJRIIFT^^ 242 

+ LA +GM MVWTHEMGFA+EVADRI+F+ +G+IL + +F+ NP+E RA+ FLS 

Sbjct: 180 KTLAKEGMTMWVTHEMGFAKEVADRIVFIDEGKILESAVPA-EFYANPKEERARLFLSR 238 

Query: 243 IINH 246 
I+NH 

Sbjct: 239 ILNH 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1185> which encodes the amino acid 
sequence <SEQ ID 1186>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3763 (Affirmative) < suco 

bacterial membrane Certainty=D . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 131/243 (53%), Positives - 179/243 (72%), Gaps = 2/243 (0%) 

Query: 2 SLISYKNVNKYYGDYHALRQIlttEIEPGQVWLLGPSGSGKSTLIRTWNALESIDDGSLV 61 

++IS K+++KYYG !■+ I+L+I PG+VW++GPSGSGKSTL+RTMN LE G + 
Sbjct: 5 AI ISIKDIiHKYYGHNEVLKGIDIjDIMPGEVWI IGPSGSGKSTLLRTMNLLEVPTKGQIR 64 

Query: 62 WGHELANISSKELVNLRKEVGMVFQHFNLYPHKTVLENITLAPIKVLKQSKKEAME1AE 121 

G ++ + ++ ++R4-++GMVFQ FNL+P+ T+LENITL+PIK +K EA + A 
Sbjct: 65 FEGIDITD-KKNDIFS^KEKMG^W•FQQFNiFP^OTrILF2JITLSPIKTKGMAKAEADKTAL 123 

Query: 122 KYLKFVM^WRKDSYPSMLSGGQKQRIAIARGI J A^WPKLLLFDEPTSALDPETIGDVLSV 181 

L V + E+ +YP+ LSGGQ+QRIAIARGLAM P +LLFDEPTSALDPE +G+VL+V 
Sbjct: 124 SLLDKVGLSEKAKAYPASLSGGQQQRIAIARGLAMDPDVLLFDEPTSALDPE^1VGEVLAV 183 

Query.- 182 MQIQ1ANIXJ^I^IIWVVTHEMGFAREVADRIIFMADGEIIJVDTTDVQDFFDKPREPRAICQFLS 241 

MQ LA GM MV+VTHEMGFA+EVADR++FM DG ++V+ FD +E R K FLS 

Sbjct: 184 MQDLAKSGMTMVIVTHEMGFAICEVADRVMFM-DGGVIVEEGSPNQLFDLTKEERTKDFLS 242 

Query: 242 Nil 244 

Sbjct: 243 RVL 245 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 365 

A DNA sequence (GBSx0396) was identified in S.agalactiae <SEQ ID 1187> which encodes the amino 
acid sequence <SEQ ID 1188>. This protein is predicted to be glutamine-binding. Analysis of this protein 
sequence reveals the following: 



Final Results 
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bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB73178 GB:AL139076 probable ABC-type amino-acid transporter 
periplasmic solute -binding protein [Campylobacter 
jejuni] 

Identities = 99/240 (41%) , Positives = 141/240 (58%) , Gaps = 3/240 (1%) 

10 

Query: 1 MLRRKRLTFYLLSCIFIFLLFYPNSTSANQLSEIKKSGVLKVGVKQDVPNFGYYNAETNQ 60 

H+RKL + + + F + + +L IK G L VGVK DVP++ + T + 
Sbjct: 1 MVFRKSLLKIAVFALGAOTAFSNANA^EGKLESIKSKGQLIVGVKNDVPHYALLDQATGE 60 

15 Query: 61 YEGMEIDIAKKIAKSL GVKPVFVPTTAQTREPLMDNGQIDILIATYTITPERKANyN 117 

+G E+D+AK +AKS+ K V A+TR PL+DNG +D +IAT+TITPERK YN 

Sbjct: 61 IKGFEVDVAKLIAKSILGDDKKIKLVAVNAKTRGPLLDNGeVDAVIATFTITPERKRIYN 120 

Query: 118 ISKAYYHDEIGFLTOKNSHIKTIKELDGKHIGVAO^TTKVNLEKYAKEHKLKFSYAQLG 177 
20 S+ YY D IG LV K K++ ++ G +IGVAQ ATTK + + AK+ + +++ 

Sbjct: 121 FSEPYYQDAIGLLVLKEKKYKSIADMKGANIGVAQAATTKKAIGEAAKKIGIDVKFSEFP 180 

Query: 178 SFPEIAISLYANRIDAFSTOKSILSGYLSPHTTILKEGFNTQEYGIATSKQDKVLIPYVN 237 
+P + +L A R+DAFSVDKSIL GY+ + IL + F Q YGI T K D YV+ 
25 Sbjct: 181 DYPS I KAALDAKRVDAFSYDKS I LLGYVDDKSEILPDS FEPQSYGI VTKKDDPAFAKYVD 24 0 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1189> which encodes the amino acid 
sequence <SEQ ID 1 190>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
30 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.16 Transmembrane 17 - 33 ( 15 - 35) 

Final Results 

bacterial membrane --- Certainty=0. 3463 (Affirmative) < suco 

35 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9097> which encodes the amino acid sequence 

<SEQ ID 9098>. Analysis of this protein sequence reveals the following: 

40 »> May be a lipoprotein 

Final Results 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

45 bacterial cytoplasm --- Certainty= 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 66/251 (26%) , Positives = 111/251 (43%) , Gaps = 27/251 (10%) 

50 Query: 23 PNSTSANQLSEIKKSGvLKVGVKQDVPNFGYYNAETNQYEGMEIDIAKKIAKSLGVKPVF 82 

P+ + + IK+ GVLKV +YN + N+ G E+D+ K+I K L +K F 

Sbjct: 34 PHQSQKSSWDT I KEKGVLKVATPGTYQPTSFYN-DNOT3LVGYEVDMV KE IGKRLNI KVKF 92 

Query: 83 VPTTAQTREPLMDNGQIDILIATYTITPERKANYNISKAYYHDEIGFLVR KNSHIK 138 

55 V T +D+G++DI + + ITP+R+ YNIS Y + G +VR N K 

Sbjct: 93 VETGFDQAFTSVDSGRVDISIiNNFDITPKRQKKYNISTPYKYGVGGMIVRADGSSNIAKK 152 

Query: 139 TIKELDGKHIGVAQGATTKVNLEKYAKEHJXKFSYAQLGSFPELAISLYANRI 191 

+ + GK AG +K A+L ++ + +Y N + 

60 Sbjct: 153 DLSDWKGKKAAGASGTEYMKVAQKQG AELVTYDNVTGDVYLNDVANGRTDF 203 

Query: 192 --DAFSVDKSILSGYLSPHTTILKE GENTQEYGIATSKQDKVLIPYVNKLLVSWEK 245 
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+ + K + LS + + + +N E GI +K+D L ++ ++ K 

Sbjct: 204 IPNDYPAQKLFVDYMLSQNPNLJSIVKMSDVQVNPTEQGIVMNKKDDSLKKKIDAVIKDMIK 263 

Query: 246 DGSLKHIYQKF 256 

DGSLK I + + 
Sbjct: 264 DGSLKKISETY 274 

SEQ ID 1188 (GBS136) was expressed in E.coli as a His-tusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 5; MW 29.9kDa). 

The GBS136-His fusion product was purified (Figure 200, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 284), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 366 

A DNA sequence (GBSx0397) was identified in S.agalactiae <SEQ ID 1191> which encodes the amino 
acid sequence <SEQ ID 1192>. This protein is predicted to be integral membrane. Analysis of this protein 
sequence reveals the following: 

Possible site: 55 

>■» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.34 Transmembrane 32 - 48 ( 27 - 55) 
INTEGRAL Likelihood = -5.04 Transmembrane 200 - 216 ( 196 - 219) 
INTEGRAL Likelihood = -3.13 Transmembrane 93 - 109 ( 93 - 113) 
INTEGRAL Likelihood = -2.02 Transmembrane 74 - 90 ( 74 - 92) 

Final Results 

bacterial membrane Certainty=0 . 4736 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — - Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB73177 GB:AL139076 putative ABC-type amino-acid transporter 
permease protein [Campylobacter jejuni] 
Identities = 112/226 (43%) , Positives = 160/226 (70%) , Gaps = 3/226 (1%) 

Query: 5 NISPFAISRWGAFFNHFDLFFKGFLYTLGISFGALLLALILGILSGGLSTSKSKVGKLIS 64 

+ ISPFA+ ++ ++ D F GF+YTL +S ALL+A I G + G ++TS+ K+ + + 
Sbjct: 25 SISPFAWKFLDALDNKDAFINGFIYTLEVSILALLIATIFGTIGGVMATSRFKIIRAYT 84 

Query: 65 RIYVEVFQNTPLLVQMVFVYYGLAIISNGHVI/IISAFFTAVLCVGLYHGAYISEVIRSGIE 124 

RIYVE+FQN PL++Q+ F++Y L ++ + + F VL VG YHGAY+ SEV+RSGI 
Sbjct: 85 RIYVELFQNVPLVIQIFFLFYALPVliG IRLDIFTIGVLGVGAYHGAYVSEWRSGIL 141 

Query: 125 AVPKGQTFJUU^QGFTANQTMQLIILPQATOTILPPMTNQVVNLIKNTSTVAIISGADIM 184 

AVP+GQ EA+ +QGFT Q M+ II+PQ +R ILPPMTNQ+VNLIKNTS + 1+ GA++M 
Sbjct: 142 AVPRGQFEASASQGFTYIQQMRYIIVPQTIRIILPPMTNQMVNLIKNTSVLLIVGGAELM 201 

Query: 185 FVAKAWAYDTTNYIPAFAGAAIFYFVICFPLASWARKQEELNKKTY 230 

A ++A D NY PA+ AA+ YF+IC+PLA +A+ E KK + 
Sbjct: 202 HSADSYAADYGNYAPAYI FAAVLYF 1 1 CYPLAYFAKAYENKLKKAH 247 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1193> which encodes the amino acid 
sequence <SEQ ID 1194>. Analysis of this protein sequence reveals the following: 

I- term signal seq. 
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Likelihood = -6.26 Transmembrane 307 - 323 ( 303 - 327) 
Likelihood = -5.89 Transmembrane 485 - 501 ( 479 - 502) 
Likelihood = -1.12 Transmembrane 375 - 391 ( 375 - 391) 

Final Results 

bacterial membrane --- Certainty=0. 3506 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAA17584 GB.-D90907 glutamine -binding periplasmic protein 
[Synechocystis sp . ] 

Identities = 146/532 (27%) , Positives = 244/532 (45%) , Gaps = 59/532 (11%) 

Query: 6 YMKKLILSCLVALALLFGGMSRAQANQYLRVGMK^YAPFNWTQDDASNGAVPIEGTSQY 65 

Y L L L+A+A+ + Q + VE+PFT ETQ 
Sbjct: 16 YYLLLALGVLLAIAIPLLPAFSQVSRQTIIVATEPTFPPFEMTD EATGQL 65 

Query: 66 ANGYDVQVAiaCVAICAMNKELLVVKTSWTGLIPALTSGKIDMIAAGMSPTKERRNEISFSN 125 

G+DV + + + +A + + + G4IPAL S + + ++ T ER +SFS+ 
Sbjct: 66 T-GFDTOLIQAIGEAAQVTVDIQGYPFDGIIPALQSNTVGAAISAITITPERAQSVSFSS 124 

Query: 126 SSYTSQPVTjVVTANGICYADATSLKDFSGAKVTAQQGVWHVNLLTQLKGAKLQTPMGDFSQ 185 

+ S VL + +LKD G ++ G + T + GAK+ T + 

Sbjct: 125 PYFKS - - VTAIAVQDGNDTIK^KDLEGKRMVAIGTTGAMVATNVPGAKV- TNFDS ITS 181 

Query: 186 MRQALTSGVIDAYISERPEAMTAEAADSRLKMITLKKGFAVAESDARIAVGMKKNDDRMA 245 

Q L 4G DA I++RP + A D+ L+ + + +E IA+ + + 
Sbjct: 182 ALQELVNGNAI^VINDRPVLLYA-IKDMLRNVTQSADVG-SEDYYGIAMPIAPPGE 236 

Query: 246 TVNQVLEGFSQTDRMALMDDMVTKQPVEKKAEDAKASFLGQMWAI FKGN 294 

+NQ E +Q ++++ EK + FL + G 

Sbjct: 237 -INQTREVMQ-GLFQIIENGTYNAIYEKWFGEKNPPFLPLVAPSLVGKVGTAQSLTERS 294 

Query: 295 WKQFLRGTGMTLLISMVGTITGLFIGLLIGIFRTAPKAKHKVAALGQK 342 

++ +G+ +T+L++ GL G + I + K 
Sbjct: 295 QANPNDNFLITLFRNLFKGSILTVLLTAFSVFFGLIGGTGVAIALISDI K 344 

Query: 343 LFGWLLTIYIEIFRGTPMIVQSMVIYYGTAQAF GISIDRTIAAIFIVSINTGAYM 397 

+ IY+E FRGTPM+VQ +IY+G F GI+IDR AAI +S+N AY+ 

Sbjct: 345 PLQLIFRIYVEFFRGTPMLVQLFIIYFGLPALFKEIGLGITIDRFPAAIIALSLNVAAYL 404 

Query: 398 SEITOGGIFAVDKGQFKAATALGFTHGQTMRKIVLPQVVRNILPATGNEFVINIKDTSVL 457 

+EI+RGGI ++D+GQ++A +LG + QTM+4++ PQ R ILP GNEF+ IKDTS+ 
Sbjct: 405 AEIIRGGIQSIDQGQWEACESLGMSPWQTMKEVIFPQAFRRILPPLGNEFITLIKDTSLT 464 

Query: 458 NVI SWELYFSGNTVATQTYQYFQTFTI IAI IYFVLTFTVTRILRYIERRFD 509 

VI EL+ G + TY+ F+ + +A++Y +LT + + +++E D 
Sbjct: 465 AVIGFQELFREGQLIVATTYRAFEVYIAVALVYLLLTTISSFVFKWLENYMD 516 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 82/210 (39%), Positives = 113/210 (53%), Gaps = 12/210 (5%) 

Query: 14 WGAFFNHFDLFFKGFLYTLGISFGALLLALILGILSGGLSTS---KSKVGKL 1 63 

W F ++ F +G TL IS + L +G+L G T+ K KV L + 
Sbjct: 288 WAIFKGNWKQFLRGTGMTLLISMVGTITGLFIGLLIGIFRTAPKAKHKVAALGQKLFGWL 347 

Query: 64 SRIYVEVFQNTPLLVQMVFVYYGLAIISNGHVMISAFFTAVLCVGLYHGAYISEVIRSGI 123 

IY+E+F+ TP++VQ + +YYG A +1 A+ V + GAY+SE++R GI 

Sbjct: 348 LTI YIEI FRGTPMIVQSMVIYYGTAQAFG- - ISIDRTIAAI FIVSINTGAYMSEIVRGGI 405 

Query: 124 EAVPKGQTEAAlAQGFTANQTMQLIILPQATOTILPPMINQVVmiKNTSTVAIISGADI 183 

AV KGQ +AA A GFT QTM+ I+LPQ VR ILP N+ V IK4-TS + +IS ++ 
Sbjct: 406 FAVDKGQFKAATAI^FTHGQTMRKIVLPQVVRNILPATGl^FVINIKDTSVLNVISVVEL 465 

Query: 184 MFVAKAWAYDTTNYIPAFAGAAIFYFVICF 213 
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F A T Y F AI YFV+ F 

Sbjct: 466 YFSGNTVATQTYQYFQTFTI IAI IYFVLTF 495 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 367 

A DNA sequence (GBSx0398) was identified in S.agalactiae <SEQ ID 1195> which encodes the amino 
acid sequence <SEQ ID 1196>. This protein is predicted to be amino acid ABC transporter, permease 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 3 9 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.95 Transmembrane 25 - 41 ( 16 - 42) 

INTEGRAL Likelihood = -3.61 Transmembrane 66 - 82 ( 65 - 86) 

INTEGRAL Likelihood = -2.44 Transmembrane 184 - 200 ( 182 - 201) 

INTEGRAL Likelihood = -0.59 Transmembrane 119 - 135 ( 119 - 135) 

Final Results 

bacterial membrane — Certainty=0 . 3781 (Affirmative) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14704 GB:Z99118 glutamine ABC transporter (integral membrane 
protein) [Bacillus subtilis] 
Identities = 84/206 (40%), Positives = 129/206 (61%), Gaps = 6/206 (2%) 

Query:, 10 ILFLLQGFGLTLYISFISIIiLSi^FGTIiLAIMRNSKNPIWKLIASIYIEFTOOTPNLLWI 69 

+ FL GF +TLY++FISI+LS FFG + +R +K P+ + ++ +E +RN+P LL I 
Sbjct: 12 IAFLWDGFLVT1.YVAFISIILSFFFGLIAGTLRYAKVPVLSQLIAVLVETIRNLPLLLII 71 

Query: 70 FIIFLVF QMKSVSAGITSFTIFTSAAIAEIIRGGLNGVDKGQTEAGLSQGFTYLQ 124 

F F +++ +A IT+ TIF SA L+EIIR GL +DKGQ EA S G 4-Y Q 

Sbjct: 72 FFTFFALPEIGIKLEITAAAITALTIFESAMLSEIIRSGLKSIDKGQIEAARSSGLSYTQ 131 

Query: 125 VFI III FPQAFRKMLPAI I SQFVTVI KDTSLLYSVIAIQEI FGKSQI LMGRYFEAGQVFT 184 

1+ PQA R+M+P I+SQF++++KDTSL VIA+ E+ +QI+ G+ + F 
Sbjct: 132 TLFFIVMPQALRRMVPPIVSQFISLLKDTSLAV-VIALPELIHNAQIINGQSADGSYFFP 190 

Query: 185 LYAIITAVYFITNFIISSFSRKLSKR 210 

++ + +YF N+ +S +R+L R 
Sbjct: 191 IFLLAALMYFAVNYSLSLAARRLEVR 216 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1197> which encodes the amino acid 
sequence <SEQ ID 1198>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.51 Transmembrane 529 - 545 ( 517 - 551) 

INTEGRAL Likelihood =-10.30 Transmembrane 697 - 713 ( 693 - 719) 

INTEGRAL Likelihood = -4.41 Transmembrane 560 - 576 ( 555 - 585) 

INTEGRAL Likelihood = -0.32 Transmembrane 662 - 578 ( 652 - 578) 

Final Results 

bacterial membrane Certainty=0. 5203 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAA17584 GB:D90907 glutamine -binding periplasmic protein 
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[Synechocystis sp.] 

Identities = 133/475 (32%) , Positives = 251/475 (52%) , Gaps = 27/475 (5%) 

Query: 273 IVSDSSFAPFEFQN-GKGKYVGIDIEL1KA1AKQC2GFKIEIANPGFDAALKAVQSSQADG 331 

+ ++ +F PFE + G+ G D++LI+AI + ++I FD + A+QS+ 

Sbjct: 46 VATEPTFPPFEMTDEaTGQLTGFDVDLIQAlGEAAQVTVDIQGYPFDGIIPALQSNTVGA 105 

Query: 332 VIAGATITDARKAIFDFSDPYYTSNIILAVKAGKH-IKNVEDLDRKTVGAKNGTSSYSWL 390 

1+ TIT R FS PY+ S + +AV+ G + IKN +DL+ K + GT+ + + 

Sbjct: 106 AISAITI TPERAQS VS FSS PYFKS VLAI AVQDGNDT I KNLKDLEGKRIAVAIGTTG - AMV 164 

Query: 391 KEMAPKYGYNVKAFDDGSSITOSIJSrSGS 1 /DAIMDDFAVLKYAISQG--RRFETPlliEGIST 448 

N P G V FD +S L +G+ DA+++D VL YAI R + + S 

Sbjct: 165 ATNVP--GAKVTNFDSITSALQELVNGNADAVINDRPVLLYAIKDAGLRNVKISADVGSE 222 

Query: 449 GEVGFAVKKGTNPEI,I---EMFNNGLAALKKSGQYDD-IDKYLDSKKA ATPSEKG 500 

G A+ E+ E+ N GIi + ++G Y+ I +K+ K PS G 

Sbjct: 223 DYYGIAMPLAPPGEINQTREVLNQGLFQIIENGTYNAIYEKWFGEKNPPFLPLVAPSLVG 282 

Query: 501 - ADESTISGI.LSNNYKQLIAGLGTTLSLTLISFAIMIIGIIFGMMAVSP 549 

+ + L ++ L G T+ LT S +1 G + +S 

Sbjct: 283 KVGTAQSLTERSQANPNDNFLITLFRNLFKGSILTVLLTAFSVFFGLIGGTGVAIALISD 342 

Query: 550 TKSLRLISTVFVDWRGIPLMIVAAFIFWGVPNLIESMTGHQSPINDFLAATIALSLNGG 609 

K L4-LI ++V+ RG P+++ I++G+P L + + G 1+ F AA IALSLN 
Sbjct: 343 IKPLQLIFRIYVEFFRGTPMLVQLFIIYFGLPALFKEI-GLGITIDRFPAAIIALSUWA 401 

Query: 610 AYIAEIvRGGIEAVPAGQMEASRSLGLSYGTTMRKVILPQAVKLMLPNFINQFVISLKDT 669 

AY+AEI+RGGI+++ GQ EA SLG+S TM++VI PQA + +LP N+F+ +KDT 
Sbjct: 402 AYIAEIIRGGIQSIDQGQWEACESLGMSPWQTMKEVIFPQAFRRILPPLGIJEFITLIKDT 461 

Query: 670 TIVSAIGLVELFQTGKIIIARNYQSFRMYAILAIIYLIMIILLTRLAKRLEJCRLN 724 

++ + IG ELF+ G++I+A Y++F +Y +A++YL++ + + + K LE ++ 
Sbjct: 462 SLTAVIGFQELFREGQLIVATTYRAFEVYIAVALVYLLIiTTISSFVFKWLENYMD 516 
Identities = 68/247 (27%) , Positives = 106/247 (42%) , Gaps = 11/247 (4%) 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/210 (32%) , Positives = 113/210 (53%) , Gaps = 16/210 (7%) 

Query: 13 LLQGFGLTLYISFISILLSMFFGTLIAIMRIISKNPIWKLIASIYIEFTONVPNLLWIFII 72 

LL G G TL ++ IS +++ G + +M S +LI++++++ VR +P ++ I 

Sbjct: 517 LLAGLGTTLSLTLISFAIAI I IGI I FGMMAVSPTKSLRLI STVFVDWRGIPLMI VAAFI 576 

Query: 73 F LVFQMKSVSAGITSFTIFT SAALAEIIRGGLNGVDKGQTEAGLSQGF 120 

F L+M+IFT A +AEI+RGG+ V GQ EA S G 

Sbjct: 577 FWGVPNLIESMTGHQSPINDFLAATIALSLNGGAYIAEIWGGIEAVPAGQMEASRSLGL 636 
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Query: 121 TYLQVFIIIIFPQAFRKMLPAIISQFVWIKDTSLLYSVIAIQEIFGKSQILMGRYFEAG 180 

+Y +1 PQA + MLP I+QFV +KDT+++ S I + E-f-F +I++ R + 

Sbjct: 637 SYGTTMRKVILPQAVKLMLPNFINQFVISLKDTTIV-SAIGLWLFQTGKIIIARNY--- 692 

Query: 181 QVFTLYAIITAVYFITNFIISSFSRKLSKR 210 

Q F +YAI+ +Y I +++ +++L KR 
Sbjct: 693 QSFRMYAILAI IYLIMIILLTRLAKRLEKR 722 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 368 

A DNA sequence (GBSx0399) was identified in S.agalactiae <SEQ ID 1199> which encodes the amino 
acid sequence <SEQ ID 1200>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

s» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.21 Transmembrane 7 - 23 ( 1-30) 

Final Results 

bacterial membrane --- Certainty=0. 5883 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04094 GB:AP001508 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 43/157 (27%), Positives = 83/157 (52%), Gaps = 9/157 (5%) 

Query: 25 YQSQFQKTTNQALAIAYKDAKVAKK- -DVTHQKIDKEFENFRGSYEIEFNTKSAEYSYHV 83 

+Q++ N+ L +A ++ + + + +K+ +N R YEIE EY + + 

Sbjct: 38 HQAESVSADNEGLTLAEASDIALERAGNGWTEAEKDRDNGRVVYEIEVKNDDDEYDFKI 97 

Query: 84 DVKTGQILERDMDNNGFSKSTSQSSSSSSQKSHKISQEEAKKIAFKDANIEESEVSNLKI 143 

D +TG+IL+ + SK SSS ++ IS +EAK+IA K+ + ++ ++++ 
Sbjct: 98 DQQTGEILKEKQEQRKGSKPREGHSSSKGSEA-VISMDEAKEIALKEVS GKIDDIEL 153 

Query: 144 KEEIENGKS VYDIDF - VDLKNKNEVDYQIDAETGKI I 179 

E ENG VY+++ D + ++V +DA TG ++ 
Sbjct: 154 - - ERF^GSLVYE\7EIESDHYDDDDVTVYVDAMTGNVL 188 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1201> which encodes the amino acid 
sequence <SEQ ID 1202>. Analysis of this protein sequence reveals the following: 

Possible site: 57 



Final Results 

bacterial membrane --- Certainty=0.3060(Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 37/96 (38%) , Positives = 63/96 (65%) , Gaps = 5/96 (5%) 

Query: 94 DMDNNGFSKSTSQSSSSSSQKSHKISQEEAKKIAFKDANIEESEVSNLKIKEEIENGKSV 153 

DMD+ +Q +S + K K+S+++AK IA KDA++ E++ L + ++ E+GK+V 

Sbjct: 59 DMDDKD- DHMDNQPKTSQTSKKVKLSEDKAKS IALKDAS VTEADAQMLSVTQDNEDGKAV 117 



WO 02/34771 



PCT/GB01/04789 



-468- 



Query: 154 YDIDFVDLKKKN-EVDYQIDAETGKIIERSRDHMND 188 
Y+I+F +NK+ E Y IDA +G I+E+S + +ND 

Sbjct: 118 YEIEF QNKDQEYSYTIDANSGDIVEKSSEPIND 150 

5 Identities = 23/62 (37%) , Positives = 37/62 (59%) 



Query: 35 NQALAIAYKDAKVAKKDVIHQKIDKEFENFRGSYEIEFNTKSAEYSYHVDVKTGQILERD 94 

++A +IA KDA V + D + ++ E+ + YEIEF K EYSY +D +G I+E+ 

Sbjct: 85 DKAKSIALKDASVTEADAQMLSVTQDNEDGKAVYEIEFQNKEQEYSYTIDANSGDIVEKS 144 

10 

Query: 95 MD 96 
Sbjct: 145 SE 146 



15 A related GBS gene <SEQ ID 8563> and protein <SEQ ID 8564> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 14.45 
GvH: Signal Score (-7.5): -5.92 
20 Possible site: 39 

>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -8.92 threshold: 0.0 

INTEGRAL Likelihood = -8.92 Transmembrane 7 - 23 ( 2-28) 
PERIPHERAL Likelihood = 10.93 37 
25 modified ALOM score: 2.28 



' Reasoning Step: 3 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4557 (Affirmative) < sued 

- Certainty-0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

35 

26.1/59.2% over 140aa 
Bacillus subtilis 

EGAD | 107494 | hypothetical protein Insert characterized 
GP|2632048|emb|CAA05607.l| |AJ002571 YkoJ Insert characterized 
40 GP|2633682|emb|CAB13185.l] |Z99110 similar to hypothetical proteins from B. subtilis 

Insert characterized 

PIR|F69859|F69859 conserved hypothetical protein ykoJ - Insert characterized 

ORF00925(379 - 852 of 1164) 
45 EGAD|l07494|BS1329(29 - 169 of 170) hypothetical protein {Bacillus subtilis} 

GP|2632048|emb|CAA05607.l| |AJ002571 YkoJ {Bacillus subtilis} 

, GP|2633682|emb|CAB13185.l| |Z99110 similar to hypothetical proteins from B. subtilis 

{Bacillus subtilis} PIR| F69859 | F69859 conserved hypothetical protein ykoJ - Bacillus 

subtilis 
50 %Match = 6.2 

%Identity =26.1 %Similarity =59.2 

Matches = 37 Mismatches = 52 Conservative Sub.s = 47 



297 327 357 387 417 447 468 498 

55 NIIE* *KEGCCMIKKNKVFLEVI,LVLWILEGGVLFYQSQFQKTTNQAIAIAYKDAKVAKKDVIH- - -QKIDKEFENFRG 

I :| I == - =:= 1= >>|> I 1= = ||»: I : 

MLKKKWMVGLIiAGCLAAGGFSYNAFATBITOEIClQASSKTDA^ 

10 20 30 40 50 60 70 

60 528 558 588 618 648 672 702 732 

SYEIEFNTKSAEYSYHVDVKTGQILERDMDIMGFSKSTSQSSSSSSQKSHK--ISQEFAKKIAFKDANIEESEVSNLKIK 
11 = 1 = =1 >ll> 111= =1= = l::|||::|h| |: |: 

VYEVEIEKEGEDYDVYVDIHTKQALNDPL- - - KEKAEQVAITKEEAEEIALKQTG- - -GTVTESKLD 

90 100 110 120 130 

65 
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EEIENGKSVYDIDBTOLKNKNEVDYQIDAETGKIIER3RDHM-K*FK*DIKKRRSKRPSF*LLSSLLPTF*KFT*IOT*DD 

I: :| « I I »»l 1= Ml" I 

ED - -DGAYIYEME - IQTKQGTETEFEI SAKDGRI IKQEIDD 
140 150 160 170 

SEQ ID 8564 (GBS37) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 4; MW 22kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 16 (lane 10; MW 47kDa). 
Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 369 

A DNA sequence (GBSx0400) was identified in S.agalactiae <SEQ ID 1203> which encodes the amino 
acid sequence <SEQ ID 1204>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty-0 . 1499 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9709> which encodes amino acid sequence <SEQ ID 9710> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1205> which encodes the amino acid 

sequence <SEQ ID 1206>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal 



Final Results 

bacterial cytoplasm Certainty=0.280B (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 128/297 (43%) , Positives = 180/297 (60%) , Gaps = 9/297 (3%) 

Query: 54 IDD I KVGSP I FKYFWT - SLSLQAPLKALE FVLEQAKMPTELSGELSETQYLVAQFSDELA 112 

I D ++GSP F W Q+ + L F+L+ +MP ELSG+L ETQ L+ +F L 

Sbjct: 46 IIDNRLGSPTFWVIWPIEKENQSAICQLLTFLLDLVEMPFELSGQLHETQTLLTRFHPSLL 105 

Query: 113 PHDDFWIALSQVIYDSFPGNSLAEDTVLNRKLHQFRYLISSQQAQYVRRYFKDVGMTDRD 172 

P FW L+ ++ +FPG +L++ L ++LHQFRY+ ISSQQAQ +R ++K + MTD 
Sbjct: 106 PDHMFWKELASLVDQAFPGKTLSQAGELEKRLHQFRYVISSQQAQSIRNHYKMIEMTDAQ 165 

Query: 173 ALVNYL SCL-REPDSIAYYESARLHNKRRRNGEIFGFPDDEPVINSKBLISFHTE 226 

AL +L CL R+ +SARLHNK R FP E N K+L+ FHTE 

Sbjct: 166 ALALFLRSKKGPCLWRQAPDYTL^SARLKNKLRFEDNKV'IFPSQEVSYNIKVLLWFHTE 225 

Query: 227 FIIDDKGNFIOTIDAEVITRNGIINGASFNYAFKNOTRHKELDVDPVK-LDPKFRNDMTR 285 

F +D G FUffi+DAEV+T GI+NGASFNY + RH +LDVDP+ DP+FR D + 
Sbjct: 226 FTLDSTGFFLNEVDAEWTEKGIV 
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Query: 286 GYRSPMLSRRKMFFFKEEDYDCSYFNI^GYYAFGRRSAKQSVDKQVKYLKKAVQKMR 342 

G+RSP R+WF +++D+ SYFN KG +A+ +S+ V K K K+ + ++ 
Sbjct: 285 GFRSPKRVFRQWFRAQKDDFMFSYFNAKGLFAYHNKSSFARVKKSAKQFKRQIHPIK 341 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 370 

A DNA sequence (GBSx0401) was identified in S.agalactiae <SEQ ID 1207> which encodes the amino 
acid sequence <SEQ ID 1208>. This protein is predicted to be similar to two-component response regulator 
10 [YcbM] (ompr-likeprotei). Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm — Certainty=C . 3129 (Affirmative) < suco 

bacterial membrane — Certainty=0 . D00D (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAA55264 GB:X78502 gtcR [Brevibacillus brevis] 

Identities = 99/228 (43%) , Positives = 149/228 (64%) , Gaps = 3/228 (1%) 

Query: 2 rtvlwo^ddetiellrsylegalykwmasdgeeafslfqqhqidlaiiditlpkidgy 61 

+T+L+ + E IELL+ +LE Y+++ A DGE+A++ +QH +DLAIIDI +P +DG+ 
25 Sbjct: 3 KTILIADDEPEIIELLKLFLERESYRIIEAYDGEQAWNYIRQHPVDLAIIDIMMPALDGF 62 



Query: 62 ELTRLIRQDSQIPIIMLAAKTTDMDRILGLNIGADDFITKPFNSLEVLARINSQLRRYYE 121 

+L + + + ++P+I+L+AK D D+ILGL +GADDFI+KPFN LE +ARI +QLRR +E 
Sbjct: 63 QLIKRLTNEYKLPVIlLSAKNRDSDKILGLGLGADDFISKPFNPiEAVARIQAQLRRAFE 122 

Query: 122 FNSLAKP--KNQFIK1GELELDEEHVELTKNGKHIKLTATEFKILH1LMS-SPGRIYTKT 178 

FN + Q+GLL ++++T E+++L+ M S I+TK 

Sbjct: 123 FNEPEEKAISTQSTTVGRLTLDHTACWyRGDETYSVTPLEYRLLNTFMQCSRTSIFTKQ 182 

Query: 179 QLYEKINGRYLEGDETTIMVHISNIRDKIEDDSKYPKYIKTLRGVGYK 226 

QL+E+ D+ TIMV IS +RDKIED + P YIKT+RG+GYK 

Sbjct: 183 QLFEQAWSETYWEDDNTIMVQISRLRDKIEDQPRQPVYIKTVRGLGYK 230 



There is also homology to SEQ ID 1 182: 

Identities = 87/230 (37%) , Positives = 144/230 (61%) , Gaps = 5/230 (2%) 

Query: 1 MRTVLWQGDDETIELLRSYLEGALYKVVI'IASDGEEAFSLFQQHQIDIAIIDITLPKIDG 60 

M+ +L+V + 4+++ L Y +V A DG EA ++F++ + DL I+D+ LP++DG 
Sbjct: 1 MKKILIVDDEKPISDIIKFNLTKEGYDIVTAFDGREAVTIFEEEKPDLIILDLMLPELDG 60 

Query: 61 YELTRLIRQDSQIPIIMLAAKTTDMDRILGLNIGADDFITKPFNSLEVLARINSQLRRYY 120 

E+ + IR+ S +PIIML+AK ++ D+++GL IGADD++TKPF++ E+LAR+ + LRR 
Sbjct: 61 LEVAKEIRKTSHVPIIMLSAKDSEFDKVIGLEIGADDYVTKPFSNRELLARVKAHLRRTE 120 

Query: 121 EFNSLAKPKN QFIKIGELELDEEHVELTKKGKHIKLTATEFKILHILMSSPGRIY 175 

+ +N Q + IG L++ + K+G+ ++LT EF++LH L + G++ 

Sbjct: 121 TIETAVAEFJJASSGTQELTIGJ&QILPDAFVAKKHGQEVELTHREFELLHHLANHMGQVM 180 



Query: 176 TKTQLYEKINGRYLEGDETTIMVHISNIRDKIEDDSKYPKYIKTIjRGVGY 225 
55 T+ L E + G GD T+ V + +R+KIED P+YI T RGVGY 

Sbjct: 181 TREHLLEIvWGYDYFGDVRTVDVTVRRLREKIEDTPSRPEYILTRRGVGY 230 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 371 

A DNA sequence (GBSx0402) was identified in S.agalactiae <SEQ ID 1209> which encodes the amino 
acid sequence <SEQ ID 1210>. This protein is predicted to be threonyl-tRNA synthetase 1 (thrS). Analysis 
of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm 
bacterial membrane 
bacterial outside 



srminal signal sequence 

- Certainty=0.2353 (Aff: 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06860 GB:AP001517 threonyl-tRNA synthetase 1 [Bacillus halodurans] 
Identities = 413/638 (64%), Positives = 506/638 (78%), Gaps = 7/638 (1%) 

1 MIKITFPDGAIREFESGITTFEIAQSISNSLAKKAIAGKFNGQLIDTTRAIEEDGSIEIV 60 

MI ITFPDGA++EF G TT EIA SIS L KKALAG +G L+D IE+DG+I IV 
4 MINITFPDGAVKEFPKGTTTAEIAGSISPGLKKKALAGMLDGTLLDIjNTPIEQDGTITIV 63 

61 TPDHEDALGVLRHSAAHLFAQAAKRLFPD--LCLGVGPAIQDGFYYDTDNKSGQISNDDL 118 

TP+ +4-AL VLRHS AH+ AQA KRLF D + LGVGP 1+ GFYYD D ++ +DL 

64 TPESDEALEVLRHSTAHVMAQALKRLFKDRNVKLGVGPVIEGGFYYDVDMDES-LTPEDL 122 



Sbjct 

Sbjct 
Query: 
Sbjct 
Query: 
Sbjct 
Query: 



Sbjct; 

Sbjct; 



Sbjct 



119 PRIEEEMKKIVKENHPCIREEISKEEALELFKD — DPYKVELI SEHAEDG - LTVYRQGEF 175 

P+IE+EMKKI+ EN P R +S+EEAL +++ DPYK+ELI++ ED 4-T+Y QGEF 
123 PKIEKEMKKIIGENLPIERWVSREEALARYEEVGDPYKIELIKDLPEDETITIYEQGEF 182 



236 AJCERDHRKLGKELDLFMVNPEVGQGLPFV/LPMGATIRREIjERYIVDKEIASGYQHVYTPP 295 

AKERDHRKLGKEL +F ++ +VGQGLP WLP GATIRR +ERYIVDKE GYQHVYTP 
243 AKERDHRKLGKELGIFALSQKVGQGLPLWLPKGATIRRIIERYIVDKEEKLGYQHVYTPV 302 



356 AELGMMHRYEKSGALTGLQRWEMTLISnDAHIFVTPEQIKDEFLKAljNLIAEIYEDFNLTD 415 

AELG+MHRYE SGA++GLQRVR MTliNDAHIF P+QIKDEF++ + LI +YEDF L + 
362 AELGLMHRYEMSGAVSGLQRVRGMTIiNDAHI FCRPDQI KDEFVRWRL IQAVYEDFGLKN 421 



476 TALGNEETLSTIQLDFLLPERFDLKYIGADGEEHRPIMIHRGGISTMERFTAILIETYKG 535 

TALG +ETLST+QLDFLLPERFDL Y+G DG+ HRP+++HRG +STMERF A L+E YKG 
482 TALGKDETLSTVQLDFLLPERFDLTYVGEDGQPHRPWVHRGWSTMERFVAFLLEEYKG 541 

53 6 AFPTWLAPQQVSVIPISNEAHIDYAWEVARVLKDRGIRAEVDDRNEKMQYKIRAAQTQKI 595 

AFPTWLAP QV VIP+S EAH++YA V L+ GIR E+D+R+EK+ YKIR AQ QKI 
542 AFPTWIAPVQVQVIPVSPEAHLEYAKKVQETLQQAGIRVEIDERDEKIGYKIREAQMQKI 601 



A related DNA sequence was identified in S.pyogenes <SEQ ID 121 1> which encodes the amino acid 
sequence <SEQ ID 1212>. Analysis of this protein sequence reveals the following: 
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5 N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 2566 (Affirmative) < succ 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 564/644 (87%), Positives = 608/644 (93%) 
Query: 
Sbji 



Sbjct 
Sbjct: 
Sbjct: 
Sbjct: 

Sbjct 

sbjct 
Query: 
Sbjct 

Sbjct 
Query: 
Sbjct: 
Query: 
Sbjct: 



MKITFPDGAIREFESGITTFEIAQSISNSIAKKA1AGKFNGQLIDTTRAIEEDGSIEIV 6 0 
MIKITFPDGA+REFESG+TTF+IA+SIS SLAKKALAGKFH QLIDTTRAIEEDGSIE1V 
MIKITFPDGATOEFESGVTTFDIAESISKSLAKKAIjAGKFNDQLIDTTRAIEEDGSIEIV 6 0 



61 TPDHEDALGVLRHSAAHLFAQAAKRLFPDLCLGVGPAIQDGFYYDTDNKSGQISNDDLPR 120 

TPDH+DA VLRHSAAHLFAQAAKRLFP+L LGVGPAI +GFYYDTDN GQISN+DLPR 
61 TPDHKDAYEVLRHSAAHLFAQAAKRLFPNLHLC-VGPAIAEGFYYDTDNAEGQISNEDLPR 120 

121 IEEEMKKIVKENHPCIREEISKEEALELFKDDPYKVELISEHAEDGLTVYRQGEFVDLCR 180 

IE EM+KIV EN+PCIREE++KEEALELFKDDPYKVELI+EHA GLTVYRQGEFVDLCR 
121 IEAEMQKIVTEI^PCIREEVTKEFJ^ELFKDDPYKVELINEHAGAGLTVYRQGEFVDLCR 180 



241 HRKLGKELDLF1VIVNPEVGQGLPFWLPNGATIRRELERYIVDKEIASGYQHVYTPPMASVE 3 00 

HRKLGKELDLFM++ EVGQGLPFWLP+GATIRR LERYI DKE+ASGYQHVYTPP+ASVE 
241 HRKLGKELDLFMISQEVGC^LPFWLPDGATlRRTLERYITDKELASGYQHVYTPPIiASVE 300 



301 FYKTSG 

YKTSGHWDHY+EDMFP MDMGDGEEFVLRPMNCPHHI+VYK+HV SYRELPIRIAELGM 
301 LYKTSG 



3 61 MHRYEKSGALTGLQRWEMTLTTOAHIFVTPEQIKDEFIjKAIjNIjIAEIYEDFNLTDYRFRIj 420 

MHRYEICSGAL+GLQRVREMT1MD HIFVTPEQI++EF +AL LI ++Y DFNLTDYRFRL 
361 MHRYEKSGALSGLQRVREMTLNDGHIFVTPEQIQEEFQRALQLIIDVYADFNLTDYRFR1. 420 



SYRDP D HKYYDNDEMWENAQ+MLK A+D+ G+DYFEAEGEAAFYGPKLDIQVKTALGN 



481 EETLSTIQLDFLLPERFDLKYIGADGEEHRPIMIHRGGISTMERFTAILIETYKGAFPTW 540 

EETLSTIQLDFLLPERFDLKYIGADGEEHRP+MIHRG I STMERFTAILIETYKGAFPTW 
481 EETLSTIQLDFLLPERFDLKYIGADGEEHRPVMIHRGVISTMERFTAILIETYKGAFPTW 540 

541 I^PQQVSVIPISNEAHIDYAWEVARVLKDRGIRAEVDDRIJEKMQYKIRAAQTQKIPYQLI 600 

LAP QV+VIPISNEAHIDYAWEVA+ L+DRG+RA+VDDRNEKMQYKIRA+QT KIPYQLI 
541 LAPHQVTVIPISNEAHIDYAWEVAKTLRDRGVRADVDDRNEKMQYKIRASQTSKIPYQLI 600 

601 VGDKEMEEKAVNVRRYGSKATETKS IEEFVESILADIARKSRPD 644 

VGDKEME+K+VNVRRYGSK T T+S+EEFVE+ILADIARKSRPD 
601 VGDKEMEDKSVNVRRYGSKTTHTESVEEFVENILADIARKSRPD 644 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 372 

A DNA sequence (GBSx0403) was identified in S.agalactiae <SEQ ID 1213> which encodes the amino 
acid sequence <SEQ ID 1214>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



-473- 



Final Results 

bacterial cytoplasm — Certainty=C . 1985 (Affirmative) < succ 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MRIGLFTDTYFPQVSGVSTSIRTLKEGLEKEGH3VYIFTTTDRNVKHFEDPTIIRLPSVP SO 

MRIGLFTDTYFPQVSGV+TS IRTLK IjEK+GH V+IFTTTD++V R+ED IIR+PSVP 
Sbjct: 1 MRIGLFTDTYFPQVSGVATSIRTLKTELEKCGHAVFIFTTTDKDVNRYEDWQIIRIPSVP 60 

Query: 51 FISFTDRRWYRGLISAYRIAKDYELDIIHTQTEFSLGLLGKLVAKALRIPWHTYHTQY 120 

F +F DRR YRG A IAK Y+LDI IHTQTEFSLSLLG +A+ L+IPV+HTYHTQY 
Sbjct: 61 FFAFKDRRFAYRGFSKALEIAKQYQLDIIHTQTEFSLGLLGIWIARELKIPV1HTYHTQY 12 0 

Query: 121 EDYVGYIAKGKllKPSMVKYIMRTYLSDLDGVICPSRIVLKLLDGYGVXIPKQVIPTGIP 180 

EDYV Y1AKG LI+PSMVKY++R +L D+DGVICPS IV +LL Y VK+ K+VIPTGX 
Sbjct: 121 EDYVHYIAKGMLIRPSMVKYLVRGFLHD VDGVI CPSEIVRDLLSDYKVKVEKRVI PTGIE 180 

Query: 181 VENYRREDISEETIKNLRTELGLADNDTMLLSLSRVSFEKNIQAILMHLSAWDENPHVK 240 

+ + R +1 +E +K LR++LG+ D + LLSLSR+S+EKNIQA+L+ + V+ E VK 
Sbjct: 181 LAKFERPEIKQENLKELRSKLGIQDGEKTLLSLSRISYEKNIQAVLVAFADVLKEEDKVK 240 

Query: 241 LVIVGDGPYLSDLKELVHSLELENSVIFTGMVEHSQVAIYYKACDFFISA 290 

LV+ GDGPYL+DLKE +LE+++SVIFTGM+ S+ A+YYKA DFFISA 
Sbjct: 241 LWAGDGPYLNDLKEQAQNLEIQDSVIFTGMIAPSETALYYKAADFFISA 290 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1215> which encodes the amino acid 
sequence <SEQ ID 1216>. Analysis of this protein sequence reveals the following: 
Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1074 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 309/444 (69%) , Positives = 370/444 (82%) 

MRIGBFTDTYFPQVSGVSTSIRTLKEGLEKEGHEVYIFTTTDRNVKRFEDPTIIRLPSVP 60 
MRIGLFTDTYFPQVSGV+TSIRTLKE LEKEGHE VY I FTTTDR+ VKRFEDPTI IRLPSVP 
MRIGLFTDTYFPQVSGVATSIRTLKEELEKEGHSVyiFTTTDRDVKRFEDPTIIRLPSVP 60 

FISFTDRRWYRGLISAYRIAKDYELDIIHTQTSFSLGLLGKLVAKALRIPWHTYHTQY 120 
F+SFTDRRWYRGLIS+Y+IAK Y LDIIHTQTEFSLGLIGK++ K7ALRI PWHTYHTQY 
FVSFTDRRVVYRGLISSYK1AKHYNLDIIHTQTEFSLGLLGKMIGKALRIPVVHTYHTQY 120 

EDWGYIAKGKLIKPSMViaiMRTYIiSDLDGVICPSRIVIMjLDGYGVKIPKQVIPTGIP 180 
EDYV YIA GK+I+PSMVK ++R YL DLDGVI CPSRI VLNLL+GY V IPK+VIPTGIP 





1 


Sbjct: 


1 


Query: 




Sbjct: 




Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sb j ct : 


241 




301 



+E Y R+DI+ E + NL+ ELG+A ++TMLLSLSR+S+EKNIQAI+ + A++ EN +K 
LEKYIRDDITAEEVTNLKAELGIAGDEXMLLSLSRISYEKNIQAIINQMPArLAENAKIK 240 

LVIVGDGPYLSDLKELVHSLELENSVIFTGMVEHSQVAIYYKACDFFISASTSETQGLTY 300 
L+IVG+GPYL DLK L LE++ V FTGMV H +VA+YYKACDFFISASTSETQGLTY 
L LIIVGNGPYLQDLKHIAMQLEVDKHVTFTGMVPHDKVALYYKACDFF1SASTSETQGLTY 300 

Query: 301 IESLASGRPIIAQSNPYLDDVISDKMFGTLYKKESDLADAILDAIAETPKMTQEAYEQKL 360 
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IESLASG PIIA NPYLDDV++DKMFGTLY E+DL DAI+DAI +TP M + +K 
Sbjct: 301 IESLASGTPI IAHGNPYIJ3DWrDKMFGTIjYYAETDLTDAI IDAILKTPVMDKRLLAKKR 360 

Query: 361 YEISAENFSKSWAFYLDFLISQKASVKEKVSLTIGNKDSHSTLRFVRKAVYLPKKVFTF 420 

YEISA++F KS+Y FYLD LI++ + +K+SL + + S+L+ V+ A++LPK+ 
Sbjct: 361 YEISAQHFGKSIYTFYLDTLIARKSKEAQKLSLYLNHSGKSSSLKLVQGAIHLPKRAAKV 420 

Query: 421 TGRASKKWKAPKRRISSIRDFLD 444 

T S KWKAP + + +I+DFLD 
Sbjct: 421 TAITSVKWKAPIKLVHAIKDFLD 444 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 373 

A DNA sequence (GBSx0404) was identified in S.agalactiae <SEQ ID 1217> which encodes the amino 
acid sequence <SEQ ID 1218>. This protein is predicted to be lipopolysaccharide biosynthesis protein- 
related protein. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4076 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Wot Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG19110 GB:AE005009 Vng0600c [Hale-bacterium sp. NRC-1] 
Identities = 117/350 (33%) , Positives = 178/350 (50%) , Gaps = 29/350 (8%) 

Query: 1 MKVLLYLEAEEYLKKSGIGRAIKHQEKALQIAGIDYTTNPT 41 

M+ L .YLEA E L+ G+ A Q AL+ ++ P 
Sbjct: 2 ME^AIiNYLBAAEALR-GGMVTATNQQRAAIjETTDVEvVETPWRAGDPWSIGSIjAAGGSCF 60 

Query: 42 DDFDLVHMOTYGIRSWLLMSKAKKTGKKVIMHGHSTEEDFRNSFIGSIILVSPLFKWYLCR 101 

FD+ H N G 3 + A++T +++H H T EDF SF GS+ ++P + YL 
Sbjct: 61 TAFDVAHCNLVGPGSVAVARHARRTDTPLVLHAHLTREDFAQSFRGSSTIAPALEPYLRW 120 

Query: 102 FYQKADAIITPTDYSKQLIKAYGIKKPIFVLSNGIDLSRYQRSEKKESAFRHYFHLSKDD 161 

FY +AD ++ P++Y+K +++AY + PI LSNG+DL Q E + R F L D 
Sbjct: 121 FYSQADLVLCPSEYTKDVLRAYPVDAPIRQLSNGVDLESMQGYESFRADTRARFDL--DG 178 

Query: 162 KVVMGAGLYFMRKGIDQFWVAAKMPDIRFIWFGETNKWIPRKVRQIWKQHPSNVTFA 221 

W G F RKG+ F E+ AK D F WFG ++ + P+NVTF 

Sbjct: 179 TVVYAVGEVFERKGLTMFCEL-AKATDHEFAWFGPYDEGPQAGAATRKWVADPPANVTFT 237 

Query: 222 GYIKGDVYEGAMSASDAFFFPSREETEGI WLEALASHQHWLRDI PVYHGWVTE - DSVE 280 

GY++ A A D + FP++ E +GI VLEA+A + WLRDIPV+ + T+ + 

Sbjct: 238 GYMEDK- -RAAFGAGDIYLFPAKTi/ENQGIAVLEAMACGKPVVLRDIPVFREFFTDGEDCL 295 

Query: 281 LATDVDGFVEKLDKVLSGKSDKIKEGYH- - -VAESRSIERIAHELASVYQ 327 

+ + + F + +D++ + + G + AES S++RI ELAS+Y+ 

Sbjct: 296 MCSTFEAFRDAIDRLADDPELRTRLGENARETAESHSLDRIGEELASIYE 345 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1219> which encodes the amino acid 
sequence <SEQ ID 1220>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 
■ Final Results 
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bacterial cytoplasm Certainty=0. 4088 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 236/332 (71%) , Positives = 276/332 (83%) 

Query: 1 MKVIjLYLEAEEYLKKSGIGRAIKHQEKM.QIAGIDYTTNPTDDFDLVHMNTYGIRSMLLM 60 

MKVLLYLEAE YL+KSGIGRAIKHQ KAL + G +TTNP + +DLVH+NTYG++SWLLM 
Sbjct: 1 MKVLLYLEAENYLRKSGIGRAII<HQAKALSLVGQHFTTNPRETYDLVHIjNTYGLKSWLLM 60 

Query: 61 SKAKKTGKKVIMHGHSTEEDFRNSFIGSNLVSPLFKWYLCRFYQKADAI ITPTDYSKQLI 120 

KA+K GKKVIMHGHSTEEDFRNSFI SNL+SP FK YLC FY KADAI ITPT YSK LI 
Sbjct: 61 IKAQKAGKKVIMHGHSTEEDFRNSFIFSNLLSPWFKICYLCHFYNKADAIITPTLYSKSLI 120 

Query: 121 KAYGIKKPIFVLSNGIDLSRYQRSEKKESAFRHYFELSKDDKWMGAGLYFMRKGIDQFV 1B0 

++YG+K PIF +SNGIDL +Y KKE+AFR YF + + +KWMGAGL+F+RKGID FV 
Sbjct: 121 ESYGVKSPIFAVSNGIDLEQYGADPKKEAAFRRYFDIKEGSKWMGAGLFFLRKGIDDFV 180 

Query: 181 EVAAKMPDIRFIWFGETNKWVIPRKVRQIVTKQHPSNVTFAGYIKGDVYEGAMSASDAFF 240 

+VA MPD+RFIWFGETNKWVIP +VRQ+V HP N+ F GYIKGDVYEGAM+ +DAFF 
Sbjct: 181 KVAQAMPDWFIWFGETNKWVIPAQVRQMVNGNHPKNLIFPGYIKGDVYEGAMTGADAFF 240 



Sbjct: ; 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens fur 
vaccines or diagnostics. 

Example 374 

A DNA sequence (GBSx0405) was identified in S.agalactiae <SEQ ID 1221> which encodes the amino 
acid sequence <SEQ ID 1222>. Analysis of this protein sequence reveals the following: 

3 N-terrainal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 5487 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) suco 

45 The protein has homology with the following sequences in the GENPEPT 



Query: 1 MTNELIMQAFEWYLPSDGNHWKKLEES I SDLKKLGI SKIWLPPAFKGTSSDDVGYGVYDL 60 

MTNE +MQ FEWYLP+DG HW+ L E S LK +GISK+W+PPAFKGT S+DVGYGVYDIi 
Sbjct: 1 MTNETWQYFEWYLPNDGKHWQHIJ^DASHLKNIGISKvWMPPAFKGTGSNDVGYGVYDL 60 

Query: 61 FDLGEFDQNGTIRTKYGRKEEYLKLI KSLKANGIKPFAD IVLNHKANGDHKEKFQVI KVN 120 

+DLGEF+QNGT+RTKYG +E+YL + +LK I P +DIVLNHKANGD KE+FQV+KVN 
Sbjct: 61 YDLGEFNQNGTVRTKYGSREDYMAWALKEQEIMPISDIVLNHKANGDAKERFQVVKVN 120 

Query: 121 PENRQE^UJSEPYEIEGWTGFDFPGRQGEYNDFKTi^KlfYHFTGLDYDAKNNETDIFMIVGDN 180 

P NRQE +SEPYEIEGWT F+FPGRQ Y+DFKWEWYHFTG+DYDA +NE I+MI+GDN 
Sbjct: 121 PSNRQEKISEPYEIEGWTQFNFPGRQDNYSDFKWHWYHFTGVDYDALHNENGIYMILGDN 180 

Query: 181 KGWADDDLIDDENGNFDYLMYNDIDFKHPEVIKNLQDWAKWFIETTGIEGFRLDAVKHID 240 
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Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 






Sbjct : 


421 


Query: 


481 


Sbjct: 


481 



-476- 

KGWA + ID ENGN+DYLMY+DIDFKHPEV ++L+DW WF+ET+G+ GFRLDA+KHID 



QF L+DV IiHM+ F+A 



: DDSL+ +P++AVTFV+NHD+Q GQALES V +WFKPLAYGLILLRQ+G 



GEF Q SF+ V+DK+ +RQ +V+G + T NCIGWTCLGDEEH 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1223> which encodes the amino acid 
sequence <SEQ ID 1224>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
»> Seems to have a cleavable N- terra signal seq. 



30 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the databases: 

>GP.-AAB00845 GB.-M57692 alpha-cyclodextrin glycosyltransf erase 
[Thermoanaerobacterium thermosulfurigenes] 
Identities = 356/710 (50%), Positives = 468/710 (65%), Gaps = 16/710 (2%) 

KTYKLLTKSAVLLGLISFPLT- -VSARDNASVTNKADFSTDTIYQIVTDRFNDGNTSNNG 64 
KT+KL+ + L L+ F LT + AA + +V+N ++STD IYQIVTDRF DGNTSNN 
KTFKLILVlMLSLTLV-FGLTAPIQAaSDTAVSNVV^STDVIYQIVTDRFVDGNTSNNP 61 

KTDVFDKN- -DLKKYHGGDWQGIIAKIKDGYLTDMGISAIWISSPVENIDSIDPSN- - -G 119 
D++D LKKY GGDWQGII KI DGYLT MG++AIWIS PVENI ++ P + G 



S +YHGYWA+DF +TN +FG+ DFQ L+ AH H+IKV+IDFAPNHTS A + 



+G LY NG L+G +++D + F+H TDFS+YE+ IY +++ LADLN N +D Y+K 



Query: 


7 


Sbjct: 


3 


Query: 


65 


Sbjct: 


62 


Query: 


120 


Sbjct: 


122 


Query: 


180 


Sbjct: 


182 




240 


Sbjct: 


242 




300 


Sbjct: 


3 02 




360 



AI WLD+G+DGIR+DAVKHM GWQKN++ 



V QA A LTSRGVP IYYGTEQY TG+ DP NR M SFN 
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Sbjct: 


361 


Query: 
Sbjct: 


420 
419 


Query: 


480 


Sbjct: 


479 


Query: 


540 


Sbjct- 




Query: 


600 


Sbjct: 


599 


Query: 


560 


Sbjct: 


657 



P +Y D h GLL G + V +DG+++ F h AG+VAVW Y 



G ITI G+GFG + GQV FG + I+SW DT + +KVP+V YNIS+ T+ TSN 



+LT QI VR ++N+ TV GE +YL G+V E+G D A+GP+FN Q + 



YP W++D +P 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 112/509 (22%) , Positives = 193/509 (37%) , Gaps = 103/509 (20%) 
EESISD--LKKLGISKIWLPPAFKGTSSDDV GYGV. 











Sbjct: 




30 




68 




Sbjct: 


138 


35 


Query: 






Sbjct: 


174 






188 


40 


Sbjct: 


211 




Query: 


246 


45 


Sbjct: 


267 




Query: 
Sbjct: 


305 
322 


50 




358 




Sbjct: 


380 


55 


Sbjct: 


403 




Query: 


459 


60 


Sbjct: 


496 



JIVLWHKANGDHKEKFQVIKVWPENRQEA 127 



- -GTTFKEDGALYKNGK LVGKFSDDKDK IFNHESWTDFS i 



G++G R+DAVKH+ 



t VFGE W S T 



T+ D V++N 4 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 375 

A DNA sequence (GBSx0406) was identified in S.agalactiae <SEQ ID 1225> which encodes the amino 
acid sequence <SEQ ID 1226>. This protein is predicted to be catabolite control protein A. Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2154 (Affirmative) < suco 
10 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9707> which encodes amino acid sequence <SEQ ID 9708> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA88121 GB:AB028599 catabolite control protein A [Streptococcus 
bovis] (ver 3) 
Identities = 304/332 (91%) , Positives = 320/332 (95%) 

^OT^DDTITIYDVAREaGVSMAWSRVVNGNKNv^<ENTRKKVLEVIDRLDYRPNAVARGLA S 0 
MTDDTITIYDVAREAGVSMAWSRVTOGNKmfKENTRKKVLEVIDRLDYRPNAVARGLA 
MNTDDTITIYDVAREAGVSMAWSRVVNGNKOTKEOTRKKVLEVIDRLDYRPNAVARGliA 60 

SKKTTTVGWI PNIANSYFSILARGIDDIAAMyKYNIVLASSDEDDDKEVNVVNTLFAKQ 120 
SKKTTWGWIPNIANSYFSIIA+GIDDIAAMYKTOIVIiASSDEDDDKEVNVVNTLFAKQ 



VUGIIFMGHHLTEKIRAEFSR3RTP+VIAGTVDLEHQLPSVNIDYKAA DV+DILA N+ 



KDIAFVSGPLIDDINGKVRLAGYKEGL+KN L+FKEGLVFEANY Y +G+ LAQRV+N+G 



ATAAYVAEDEIAAGLLNGLF AGK+VPEDFEI+TSNDSPI YTRPNL+SI SQPVYDLGA 



VSMRMLTKIM+KEELEEKEI+LNHG+ RGTT 
VSMRMLTKIMNKEELEEKEIILNHGLKIjRGTT 332 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1227> which encodes the amino acid 
sequence <SEQ ID 1228>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 



Final Results 

bacterial cytoplasm Certainty=0. 2154 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 307/332 (92%) , Positives = 320/332 (95%) 

Query: 1 MNTDDTITIYDVAREAGVS^OTSRVVNGNKIWKENTRKKVLEVIDRLDYRPNAVARGLA 60 

MNTDD +TIYDVAREAGVSMATVSRVVNGNKN\nKENTRKKVLEVIDRLDYRPNAVARGIA 
Sbjct: 1 ^MTDDPLTIYDVAREAGVSMATVSRvWGNKN\^CENTRKKVLEVIDRLDYRPNAVARGLA 60 
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Query: 
Sbjct: 
Query. 
Sbj ct 
Query 
Sbj ct 

Sbjct 

Sbjct 



61 SKKTTTVGWI PNIANSYFS ILARGIDDIAAMYKYNIVLASSDEDDDKEVNVVNTLFAKQ 120 

SKKTTWGWIPNIJfflSYFSIIA+GIDDIJUMYKXWIVIASSDEDDDKEVlWVMTLFAKQ 
61 SKICITWGWIPNIANSYFSILAKGIDDIAAMYICYNIVLASSDEDDDKEVNVVNTLFAKQ 120 



181 KDIAFVSGPLIDDINGKVRLAGYKEGLKKNGI^FICEGLVFEANYRYAEGFALAQRVINAG 240 

K IAFVSGPLIDDINGKVRLAGYKEGLK N L+FKEGLVFEANY Y EGF IAQRVIN4G 
181 KCIAWSGPLIDDINGKA7RLAGYKEGLKHJIKLDFKEGLVFEANYSYKEGFELAQRVINSG 240 

241 ATAAYVAEDELAAGLLNGLFEAGKRVPEDFEIITSNDSPIAQYTRPNLTSISQPA7YDLGA 300 

ATAAYVAEDEIAAGLLNGLFEAGKRVPEDFEI ITSNDSP+ QYTRPNL+SISQPVYDLGA 
241 ATAAYVAEDELAAGLLNGLFEAGKRVPEDFEIITS1TOSPWQYTRPNLSSISQPVYDLGA 300 

3 01 VSMRMLTKIMHKEELEEKEIVLNHGIVKRGTT 332 

VSMRMLTKIM+ KEELEEKE I +LNHG I KRGT" 
3 01 VSMRMLTKIMNKEELEEKEILLNHGIKKRGTT 332 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 376 

A DNA sequence (GBSx0407) was identified in S.agalactiae <SEQ ID 1229> which encodes the amino 
acid sequence <SEQ ID 1230>. This protein is predicted to be PcpQ (pepQ-2). Analysis of this protein 
sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1113 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC46293 GB:AF014460 PepQ [Streptococcus mutans] 
Identities = 257/359 (71%) , Positives = 304/359 (84%) 

MSKLNRIRHHLHSVQAEI^VFSDPvTVNYLTGFFCDPHERQMFLFVYEDRDPILFVPALE 6 0 
MSKL +1 L E AV SDPV++NYLTGF+ DPHER MFLF++ D++ +LF+P L+ 

MSKLAQIVQKLKKQGIEAAVLSDPVSINYLTGFYSDPHERLMFLFLFADQETLLFLPELD 6 0 

VSRAKQSVPFPVFGYIDSENPWQKIAENLPSFSVSOTIAEFDNIiNVTKFQGLQTVFDGHF 120 

RAK + V GY+D ENP +KI + LP + SK+ EFDNLNVTKF+GL+T+F G F 
ALRAKS I LDI S VTGYLDFENPLEKI KTLLPKTNYS KI ALEFDNIjNVTKFKGLETI FSGQF 120 

ENLTPYIQNMRLIKSRDEIEKMLVAGEFADKAVQVGFDNISLNNTETDIIAQIEFEMKKQ 180 
NLTP I MRIiIKS DEI+K+I,+AGE ADKAVQ+GFD+ISLN TETDIIAQIEFEMKK 



QFKKDIY+4CLEA A+DFIKPGV A-r+VDAAAR+VIEKAGYG YFNHRLGHG+GM +H 





1 


Sbjct: 




Query: 




Sbjct: 




Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 






241 


Sbjct: 


241 



60 Query: 301 EFPSIMAGNDMEIQEGMCFSVEPGIYIPDKVGVRIEDCGYVTKTGFEVFTKTPKELljYF 359 

EFPSIMAGNDM ++EGMCFSVEPGIY1P+KVGVRIEDCG+VTK GFEVFT+TPKELLYF 



WO 02/34771 



PCT/GB01/04789 



-480- 

Sbjct: 3 01 EFPSIMAGNDMLLEEGMCFSVEPGI'YIPEKVGWIEDCGHVTKNGFEVFTQTPKELDYF 359 

A related DNA sequence was identified in S.pyogenes <SEQ ID 123 1> which encodes the amino acid 
sequence <SEQ ID 1232>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 42 - 58 ( 42 - 59) 



Final Results 

bacterial membrane Certainty=0 . 1362 (Affirmative) . 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < : 

The protein has homology with the following sequences in the databases: 



Query: 1 MTKLDQIRLYLDQKGAELAIFSDPVTINYLTGFFCDPHERQLFLFVYHDLAPVLFVPALE 60 

M+KL QI L ++G E A+ SDPV+ INYLTGF+ DPHER +FLF++ D +LF+P L+ 
Sbjct: 1 MSKLAQIVQKLKKQGIEAAVLSDPVSINYLTGFYSDPHERLMFLFLFADQETLLFLPELD 60 

Query: 61 VARASQAISFPVFGYVDSENPWEKIKAVLPNTAAKTIYAEFDHLNVNKFHGLQTIFSGQF 120 

RA + V GY+D ENP EKIK +LP T I EFD+LNV KF GL+TIFSGQF 
Sbjct: 61 ALRAKSILDISVTGYLDFENPLEKIKTLLPKTNYSKIALEFDNLNVTKFKGLETIFSGQF 120 

Query: 121 NNLTPYVQGMRLVKSADEINKMMIAGQFADKAVQVGFDNISLDATETDVIAQIEFEMKKQ 180 

NLTP + MRL+KSADEI K++IAG+ ADXAVQ+GFD+ ISL+ATETD+IAQIEFEMKK 
Sbjct: 121 TNLTPLINRMRLIKSADEIQKLLIAGEIiADKAVQIGFDSISLNATETDIIAQIEFEMKKL 180 



Query: 181 GIHKMSFDTMVLTGNNAANPHGIPGTNNIENNALLLFDLGVETLGYTSDMTRTVAVGQPD 240 

G+ KMSF+TMVLTG+NAANPHG+P ++ IENN LLLFDLGVE+ GY SDMTRTVAVGQPD 
Sbjct: 181 GVDKMSFETMVLTGSNAANPHGLPASHKIENNHIjLLFDLGVESTGYVSDMTRTVAVGQPD 240 

Query: 241 QFKIDIYNLCLEAQLAAIDFIKPGVTAAQVDAAARQVIEKAGYGEYFNHRLGHGIGMDVH 300 

QFK DIYN+CLEAQL a+dfikpgv+aaqvdaaar viekagyg+yfnhrlghgigm +h 

Sbjct: 241 QFKKDIYNICLEAQLTALDFIKPGVSAAQVEAAARSVIEKAGYGDYFNHRLGHGIGMGLH 300 

Query: 301 EFPSrMAGNDLVLEEGMCFSVEPGrYIPGKVGVRIEDCGHVTKNGFEVFTHTPKELLYF 359 

EFPSIMAGND++LEEGMCFSVEPGIYIP KVGVRIEDCGHVTKNGFEVFT TPKELLYF 
Sbjct: 301 EFPS IMAGNDMLLEEGMCFS VEPG1YI PEKVGVRIEDCGHVTKNGFEVFTQTPKELLYF 359 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 238/361 (79%) , Positives = 325/361 (89%) 

MSKLNRIRHHLHSVQAEIAVFSDPVTVNYLTGFFCDPH3RQMFLFVYEDRDPILFVPALE 6 0 
M+KL++IR +L AELA+FSDPOT+NYLTGFFCDPHERQ+FLFVY D P+LFVPALE 
MTKLDQIRLYLDQKGAELAI FSDPVTINYLTGFFCDPH3RQLFLFVYHDLAPVLFVPALE 6 0 

VSFAKQSVPFPVFGYIDSENPWQKIASNLPSFSVSICVLAEFDNLNVTKFQGLQTVFDGHF 120 
V+RA Q++ FPVFGY+DSENPW+KI + LP+ + + AEFD+LNV KF GLQT+F G F 
VARASQAISFPVFGYVDSENPWEKIKAVLPNTAAKTI YAEFDHLNVNKFHGLQTIFSGQF 12 0 



E+KMSFDTMVLTGNNAANPHGIPGTN IENMALLLFDLGV3TLGYTSDMTRTVAVG+PD 



QFK DIY+LCLEA AA1DFIKPGV A++VDAAAR VIEKAGYG+YFNHRLGHG+GMDVH 





1 


Sbjct: 


1 


Query: 
Sb j ct : 


61 






Sb j ct : 


121 




181 


Sb j ct : 


181 


Query: 


241 


Sbjct: 


241 
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Query: 301 EFPSIMAGNDMEIQEGMCFSVEPC-IYIPDCTGVRIEDCGYVTKTGFEVFTKTPKELLYFEG 361 

EFPSIMAGND+ ++EGMCFSVEPGIYIP KVGVRIEDCG+VTK GFEVFT TPKELLYFEG 
Sbjct: 301 EFPSIMAGNDLVLEEGMCFSVEPGIYIPGICVGVRIEDCGHVTKNGFEVFTHTPKELLYFEG 351 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 377 

A DNA sequence (GBSx0408) was identified in S.agalactiae <SEQ ID 1233> which encodes the amino 
acid sequence <SEQ ID 1234>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3629 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 378 

A DNA sequence (GBSx0409) was identified in S.agalactiae <SEQ ID 1235> which encodes the amino 
acid sequence <SEQ ID 1236>. This protein is predicted to be beta-hexosamidase A precursor. Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3279 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB11942 GB:Z99104 alternate gene name: yzbA-similar to 
beta-hexosaminidase [Bacillus subtilis] 
Identities = 151/602 (25%) , Positives = 268/602 (44%) , Gaps = 69/602 (11% 





26 


Sbj Ct: 


39 




79 


Sbjct: 


98 


Query: 


128 


Sbjct: 


1S8 




188 


Sbjct: 


218 



+N M+LDEK+GQ+ 



ILQTKSKLPMLIAAWTEAGGDGAVTroTKVGDEIKVAA'rNDPKYAYEMG 127 

+ K+P++++ + E G + +GT + + A AY+ G 

TTKQTVQLTDDYQKASPKIPLMLSIDQEGGIVTRLGEGTKFPGNMALGAARSRINAYQTG 157 



I G E SA+G N FSP+VD+ N NP+I R++ +N + L MKG+ + +1 



DAG VM H+ 
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Query: 248 KEMHPER--DLDDMLPASMKTLLDELLRGELGYNGAI\i"DASHMVGMTASMARRDLLPT 305 

+ + D ++PA+L+K ++ LLR E+G+NG IVTDA +M + + + + 

Sbjct: 278 DTTYKSKLDGSDILVPATLSKKVMTGLLRQEMGFNGVI\rTDALNMKAIADHFGQEEAVVM 337 

Query: 306 AIEAGCDLFLF FNDPDED IQWMKEGYEKGILTEERLHDALRRTLGLKAKLG 356 

A++AG D+ L E+ IQ +KE + G + E+++++++ R + LK K G 

Sbjct: 338 AVKfiG VDIALMPAS VTSLKEEQKFARVI QALKEAVKNGDI PEQQINNSVERI I SLKIKRG 397 

Query: 357 LHI^GRRQKLFMPK-DKAMALINTLESQKIADEVADKAVTLVKDKQKDIFPVWPERYRH 415 

+ YR + KKA++ + +K ++A+KAVT++K++Q + P P++ 
Sbjct: 398 M--YPARNSDSTKEKIAKAKKIVGSKQHLKAEKKLAEKAVTVLKNEQHTL-PFKPKKGSR 454 

Query: 416 ILLVNVEGYKGGFGAMIAGNKQRASDYMKE LLEARGHEVTVWESTEERIMKLPQ 459 

IL+V + A +Q D +K L V+++ E+ +K 

Sbjct: 455 ILIV APYEEQTASIEQTIHDLIKRKKIKPVSLSKMNFASQVFKTEHEKQVK- - - 505 

Query: 470 EERAAAIANVYAQK- QPIANLTEHYDLI INLVDVNAGGTTQRI IWPAAKGTPDQPFYVHE 528 

E I Y K P+ N D +1+ D + + ++P A + H 

Sbjct: 506 -EADYI ITGSYWKNDPWN DGVID- -DTISDSSKWATVFPRA VMKAALQHN 554 

Query: 529 IPSIVISVQHAFALADMPQVGTY1NAYD GLPSTI SAWAKLAGESEFTGVSP 580 

P +++S+++ +A++ IY LIAV + G+++ G P 

Sbjct: 555 KPFVLMSLRNPYDAANFEEAKALIAVYGFKGYAKGRYLQPNIPAGVMAIFGQAKPKGTLP 614 

Query: 581 VD 582 
VD 

Sbjct: 615 VD 616 



No corresponding DNA sequence was identified in S.pyogenes. 

30 A related GBS gene <SEQ ID 8565> and protein <SEQ ID 8566> were also identified. Analysis of this 
protein sequence reveals the following homology to a lipoprotein, with homology with the following 
sequences in the databases: 

29.5/52.3% over 422aa 

Bacillus subtilis 

35 EGAD | 20114 | hypothetical 70.6 kd protein in feua 5 'region precursor Insert characterized 

SP 1 P404 06 | YBBD^BACSU HYPOTHETICAL 70.6 KDA LIPOPROTEIN IN FEUA-SIGW INTERGENIC REGION 
PRECURSOR (ORF1) . Insert 
characterized 

GP| 1944006 | dbj |BAA19499.l| |AB002150 YbbD Insert characterized 
40 GP|438455|gb|AAA64351.l| |L19954 possible N-terminal signal sequence; mature protein may 

be membrane - anchored and start at Cys-17. 17.5% identity 

over 354 -aa overlap with Candida pelliculosa beta-glucosidase . ; putative Insert 
characterized 

GPl 2632433 I emb Insert characterized 

45 

ORF0043K367 - 1557 of 2388) 

EGAD|20114|BS0166(36 - 458 of 642) hypothetical 70.6 kd protein in feua 5'region precursor 
{Bacillus subtilis] SP|P40406 |YBBD_BACSU HYPOTHETICAL 70.6 KDA LIPOPROTEIN IN FEUA-SIGW 
INTERGENIC REGION PRECURSOR (ORF1) . GP| 1944006 |dbj j BAA19499 . 1 1 |AB002150 YbbD {Bacillus 
50 subtilis} GP|438455|gb|AAA64351.l| |L19954 possible N-terminal signal sequence; mature 

protein may be membrane -anchored and start at Cys-17. 17.5% identity over 354-aa overlap 
with Candida pelliculosa beta-glucosidase.; putative {Bacillus subtilis} GP| 2632433 | emb 
%Match =9.6 

%Identity =29.5 %Similarity =52.2 
55 Matches = 119 Mismatches = 183 Conservative Sub.s = 92 



114 144 174 204 234 264 294 324 

LMVGDSLGDLAAAEQNGIAFYPVIiVGKEWSWElLREDIGEAFAKGQFEQQRQKESINTEWANLDN**KG*AMTHLVDLT 

60 MRPVFPLILSAVLFLSCFFGA 

10 20 

354 384 414 426 456 486 528 
KKPFNLNQEAIEWIEKTINEMTLDEKIGQLFF NMGASRSEEYLTDVLDRYHIAAVRYNRGS SSEIYDQ 



WO 02/34771 



PCT/GB01/04789 



-483- 

I : II : ' 



NLIL QTKSKLPMLIAANTEAGGDGAVTDGTKTGDEL^ 

= I : l = |:::= • I I = = II = = I lb I I I I 11 = 1 I 111 = 11 = 

TVQLTDDYQKASPKIPLMLSIDQEGGIVTRLGEGT^ 



I 11=1 l== =1 = I 111= = =1 lilll 1=11= =1 = = II 

NPDNPVIGVRSFSSNRELTSRLGLYTMKGLQRQDIASALKHFPGHGDTDVDSHYGLPLVSHGQERLREVELYPFQKMDA 
200 210 220 230 240 250 260 

1023 1053 1080 1107 1137 1167 1197 1227 

GLPGW^GHIHLPNVEKEMHPER-DLDDML-PASLNKTLLDELLRG 

i ii i = = = i = -m = i 11 = 1 = 1 == iii 1 = 1 = 11 inn =i = = = = i = = i 

GADMVMTAHVQFPAFDDTTYKSKLDGSDILVPATLSKKVM^ 

280 290 300 310 320 330 340 

1290 1320 1350 1380 1410 1437 

GCDLFLF FNDPDE---DIQWMKEGYEKGILTEERLHDALRRTLGLKAKLGLHWYEGRRQELFMPK-DKAMALIN 

| |: | : = : || ,|| : | : | : | : || | |: | | : | || :: 

GVDIALMPASOTSLKEEQKFARVIQALKEAVKNGDIPEQQINNSVERIISLKIKRGM--YPARNSDSTKEKIAKAKK1VG 
360 370 380 390 400 410 

1467 1497 1527 1557 1587 1617 1647 1677 

TLESQKIADEVADKAOTOTKDKQKDIFPWPERYRHILLV^ 

= = I =:|=||ll==l==l =1 l== 11=1 = = I l=== 

SKQHLKAEKKrAEKAVTVLKNEQ-HTLPPKPKKGSRILIVAPYEEQTASIEQTIHDLIKRKKIKPVSLSraviWFASQVFKT 
430 440 450 460 470 480 490 

SEQ ID 1236 (GBS50) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 1 (lane 8; MW 69.2kDa). 

GBS50-His was purified as shown in Figure 192, lane 5. 

The GBS50-His fusion product was purified (Figure 192, lane 5) and used to immunise mice. The resulting 
antiserum was used for FACS (Figure 264), which confirmed that the protein is immunoaccessible on GBS 
bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 379 

A DNA sequence (GBSx0410) was identified in S.agalactiae <SEQ ID 1237> which encodes the amino 
acid sequence <SEQ ID 1238>. Analysis of this protein sequence reveals the following: 

j N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 226 6 (Affirmative) < suco 

bacterial membrane — - Certainty=0 . 0000 (Not Clear) < succs. 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 380 

A DNA sequence (GBSx0411) was identified in S.agalactiae <SEQ ID 1239> which encodes the amino 
acid sequence <SEQ ID 1240>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2279 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9705> which encodes amino acid sequence <SEQ ID 9706> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 26 NKWVITGAGGVLCGYMAKEFAKAGAKVALLDI^'QEAAQTFADEIVEEGGIAKAYKANVL 85 

NK+++ITGAGGVLC ++AK+ A A +ALLDIM EAR A EI + GG AKAYK NVL 
Sbjct: 15 NKLI I ITGAGGVLCSFLAKQIAYTKANIALLDIilFEAADKVAKEINQSGGKAKAYKTNVL 74 

Query: 86 SKENLEEVHQAVLEDLGPTDILVNGAGGNNPKATTDNEFHELDLPSETKTFFELDEAGIS 145 

EN++EV + D G DIL+NGAGGNNPKATTDNEFH+ DIi T+TFF+LD++GI 
Sbjct: 75 ELENIKEVRNQIETDFGTCDILINGAGGNNPKATTDNEFHQFDLNETTRTFFDLDKSGIE 134 

Query: 146 FVFNLNYLGTLLPTQVFAQDMVGREGANI INI SSMNAFTPLTKI PAYSGAKAA1SNFTQW 205 

FVFNKNYLG+LLPTQVFA+DM+G++GANIINISSMNAFTPLTKIPAYSGAKAA.ISNFTQW 
Sbjct: 135 FVFJILNYLGSLLPTQVFAKDMLGKQGANIINISSMNAFTPLTKIPAYSGAKAAISNFTQW 194 

Query: 206 IAVHFSKVGIRCNAIAPGFLVTNQNRSLLFTEIX3QPTARAEKILNNTPMGRFGEASELIG 265 

IAV+FSKVGIRCNAIAPGFLV+NQN +LLF +G+PT RA KIL NTPMGRFGE+ EL+G 
Sbjct: 195 IAVYFSKVGIRCNAIAPGFLVSNQNIALLFDTEGKPTDRANKILTNTPMGRFGESEELLG 254 

Query: 266 GLFFIADEKSSSFVNGWLPIDGGFAAYSGV 296 

L FD DE S+FVNGWLP+DGGF+AYSGV 
Sbjct: 255 ALLFLIDENYSAFVNGWLPVDGGFSAYSGV 285 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1241> which encodes the amino acid 
sequence <SEQ ID 1242>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0358 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 77/279 (27%), Positives = 125/279 (44%), Gaps = 19/279 (6%) 

Query: 18 MSKTITFTNKWVITGAGGVLCGYMAKEFAKAGAKVA^^ 77 

M + K+ +ITGA + +AK +A+AGA + D+ QE EGA 
Sbjct: 1 MEMMFSLQ/3KIALITGASYGIGFEIAKAYAQAGATIVFNDIKQELVDKGLflAYRELGIEA 60 



Query: 78 KAYKANVLSKERLEEVHQAVLEDLGPTDILWGAGGNNPKATTDNEFHELDLPSETKTFF 137 
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IIRRTPML 103 

Query: 138 ELDEAGISFVTOIOTIfiTLLPTQVFAQDWGREGMIIINISSIOTiFTPLTKIPAYSGAKA 197 

E+ V ++4 + ++ M+ + IINI SM + + AY+ AK 

Sbjct: 104 EMAAEDFRQVIDIDLNAPFI VSKAVLPS.MI AKGHGK I INI CSMMSELGRETVSAYAAAKG 1S3 

Query: 198 AISNFTQWI^VHFSKVGIRCNAIAPGFLVraQlffiSLLFTE-DGQPTARAEKILNNTPMGR 256 

+ T+ +A F + I+CN I PG++ T Q L + DG + 1+ TP R 

Sbjct: 164 GLKMLTKNIASEFGEANIQCNGIGPGYIATPQTAPLRERQADGSRHPFDQFIIAKTPAAR 223 

Query: 257 FGEASELIGGLFFLADEKSSSFVNGWLPIDC-GFAAYSG 295 

+G +L G FLA + +S+FVNG +L +DGG AY G 
Sbjct: 224 WGTTEDLAGPAVFLASD-ASNFVNGHILYVDGGIIiAYIG 261 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 381 

A DNA sequence (GBSx0412) was identified in S.agalactiae <SEQ ID 1243> which encodes the amino 
acid sequence <SEQ ID 1244>. This protein is predicted to be D-mannonate dehydrolase (uxuA). Analysis 
of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3188 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MEMSFRWYGEDDPVTLENIGQIPTMKGIVTAIYDVPVGEVWSRERIQQLKEKVEAAGLKI 60 

M ++ RW+G D V LE I QIP MKGIV+AIYDV VG VW +E+I LK +E GL + 
Sbjct: 1 MRLTMRWFGPSDKVKLEYIKQIPG^GIVSAIYDVAVGGVWPKEKILALKNNIERHGLTL 60 

Query: 61 SVIESVPVHEDIKLGRPTRDLLIDNYIQTVKNLAAEGIDTICYNFMPVFD1»ITRTDLAYQY 120 

VIESVPVHEDIKLG+PTRD I+NY QT+++LA GIDT+CYNFMPVFDWTR+ L ++ 
Sbjct: 61 DVIESVPVHEDIKLGKPTRDRYIENYKQTLRHLAECGIDTVCYNFMPVFDWTRSQLDFKL 120 

Query: 121 PDGSTALIFDETVSKKMDPVNGELSLPGWDASYSKEEMKAIMDAYAEIDEEKLWENLTYF 180 

DGS ALI++E V + +P++GEL LPGWD SY E +K ++ AY +1 EE LW++LTYF 
Sbjct: 121 EDGSEALIYEEDVISRTNPLSGELELPGWDTSYENESLKGVLQAYKKISEEDLWDHLTYF 180 

Query: 181 I KRI I PEAEAVGVKMAIHEDDPPYS I FGLPRI ITGLEAIERFVKLYDSKSNG ITLCVGSY 240 

++ I+P A+ VG+KMAIHPDDPP+SIFGLPRI+T +ER + LYDS ++GIT+C GS 
Sbjct: 181 VQAIMPVADEVGIKMAIHPDDPPWSIFGLPRIVTNKANLERLLSLYDSPNHGITMCSGSL 240 



++ ND+ E+ R R++F HARNIK +SF+ESAH S 



Query: 301 FGFEGAIREDHGRMIWGETGRPGYGLYDRALGATYVSGLYEAV 343 

GF G 4RPDHGRMIWGE GRPGYGLYDRALGATY++G++EAV 
Sbjct: 301 IGFTGPLRPDHGRMIWGEKGREGYGLYDRALGATYLNGIWEAV 343 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 382 

A DNA sequence (GBSx0413) was identified in S.agalactiae <SEQ ID 1245> which encodes the amino 
acid sequence <SEQ ID 1246>. Analysis of this protein sequence reveals the following: 

i N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2447 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 383 

A DNA sequence (GBSx0414) was identified in S.agalactiae <SEQ ID 1247> which encodes the amino 
acid sequence <SEQ ID 1248>. This protein is predicted to be uronate isomerase. Analysis of this protein 
sequence reveals the following: 

J- terminal signal sequence 



25 Final Results 

bacterial cytoplasm Certainty=0 . 3066 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

[Bacillus halodurans] 
294/465 (62%) , Gaps = 7/465 (1%) 

3 FNTETFMLKNQAAIQLYEE-VKRQPIFDYHCHLDPKDIFEDHIFDNIVDU1LGGDHYKWR 61 
F +E F+L N+ +LY K PI DYHCHL P++I+E+ F+N+ WLGGDHYKWR 

4 FLSEDFLLMNEYDRELYYTFAKNMPICDYHCHLSPQ3IWENKPFENMTKAWLGGDHYKWR 63 

62 LMRANGISFJffilTGPASNLEKFKAFARTLERAYGNPVYHWSAMELKNVFGVNEILTESNA 121 

MR NG+ E 1TG A + EKF A+A+T+ + GNP+YHW+ MELK F ++ L E+N 
64 AMRI^GWEEFITGGAPDKEKFLAWAKT\rPKTIGNPLYHV5THMELKTYFHFHQPLDETNG 123 

122 EEIYHRI^FLKEHKISPRRLIADSKAmFIGTTDHPLDTLEWHKKLAADESFKTWAPTF 181 

E ++ N L++ +PR LI S V IGTTD P D+Ii +H+KL AD++F V PTF 
124 ENVVTOACNRLLQQEAFTPRALIERSNVRAIGTTDDPTDSLLYHQKLQADDTFHVKVIPTF 183 



Sbjct: 

Sbjct 
Query 
Sbjct 
Query 
Sbjct 



241 VFEQTDELELNDLFNKVCEGYIPNQSEISKTft'QTAVFMELCRLYKKYGFVTQVHFGALRNN 300 

F + +E E +F K + E K++T + L + Y G+V Q H G +RNN 

244 PFVEVNEQEAQHIFRKRLANEGLTKVEraKYKTFLMTOLGKEYAARGWVMQra 303 

301 HSTIFEKLGADVGVDSLGD-QVALTVNWJRLLDSLVKKDSLPKMIWYNLNPAYNIAVANT 359 
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Query: 360 LANFQANELGVRSYLQFGAGWWFADTKLGMISQMtrAIAEQGMjANFIGMLTDSRSELSYQ 419 

+ NF E GVR +QFG+ WWF D GM Q+ LA G+L+NFIGMLTDSRSFLSY 
Sbjct: 362 IGNF--TESGVRGKVQFGSAWWFNDHIDGMRRQLTDIASVGLLSNFIGMLTDSRSFLSYP 419 

Query: 420 RHDYFRRILCTYLGEWIEEGEVPEDYQALGSMAKDIAYQNAVNYF 464 

RHDYFRRILC +G WI+EG4-+P D + G + +DI Y N V+YF 
Sbjct: 420 RHDYFRRILCQLIGSWIKEGQLPPDMERWGQIVQDICYNNWDYF 464 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 384 

A DNA sequence (GBSx0415) was identified in S.agalactiae <SEQ ID 1249> which encodes the amino 
acid sequence <SEQ ID 1250>. This protein is predicted to be 2-dehydro-3-deoxyphosphogluconate 
aldolase/4-hydroxy-2-oxoglutarate al. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3883 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9703> which encodes amino acid sequence <SEQ ID 9704> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD35160 GB:AE001693 2-dehydro-3-deoxyphosphogluconate 

aldolase/4-hydroxy-2-oxoglutarate aldolase [Thermotoga maritima] 
Identities = 93/199 (46%) , Positives = 125/199 (62%) , Gaps = 6/199 (3%) 

K + AV+R S E+A E A GG+ IE+TF+ P+A VIK+LS F K 1+ 

Sbjct: 8 KKHKIVAVLRANSVEEAKEKAI^VFEGGTOLIEITFWPDADWIKELS--FLKEK33AII 65 

Query: 97 GAGTVMTTELAKEAIDAGAKFLVSPHFDSDIANIANS^ 156 

GAGTV + E ++A+++GA+F+VSPH D +1+ E V+Y PG T TE+V A K 
Sbjct: 66 GAGTVTSVEQCRKAVESGAEFIVSPHLDEEISQFCKEKGVFYMPGVMTPTELVKAMKLGH 125 

Query: 157 QIIKLFPGGWGPGFIKDIHGP1PDVDLMPSGGVSV3NWEWRKAGAVAVGVGSALSSKV 216 

I+KLFPG WGP F+K + GP P+V +P+GGV++ NV EW KAG +AVGVGSAL 
Sbjct: 126 TILKLFPGEWGPQWKAMKGPFPOTKFVPXGGVmjDOTCEWFKAGVLAVGVGSALVKGT 185 

Query: 217 ATEGYDSVTKIAKQFVSAL 235 

DV + MFV + 
Sbjct: 186 P DEVREKAKAFVEKI 200 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1251> which encodes the amino acid 
sequence <SEQ ID 1252>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1039 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 82/204 (40%) , Positives = 132/204 (64%) 

Query: 32 M1MQLKNNYFFAVIRGKSSEDALEIAKHAILGGIR1IIEVTFSTPEASKVIKQLSDDFKNN 91 

+L +LK N V+RG+SSE+AL + +1 GGI+ IEVT++ P AS+VI QL++ FK + 
Sbjct: 6 I liTKLKANRLVLWRGESSEEALACSLAS IEGG I KT I E VTYTNPFASEVIGQLAERFKED 65 

Query: 92 KEI 1 VGAGTVMTTELAKEAIDAGAKFLVS PHFD SD I ANLANENKVYYFPGCATATEI WA 151 

E+++GAGTV+ A++AI AGA+F4V P+F+ +A + + + Y PGC T E+V A 
Sbjct: 66 PEVLIGAGTVLDDWARQAIIAGAQFIVGPNFJ^AVALICHRYSIPYLPGCMTVNEVVTA 125 

Query: 152 RKYKCQIIKLFPGGWGPGFIICDIHGPIEDVDLMPSGGVSVSNWEWRKAGAVAVGVGSA 211 

+ ++K+FPG VG FI+ I P+P V++M +GGVS N+ +W AG +G+G 
Sbjct: 126 LESGVDMVKIFPGSTVGISFIRAIKSPLPQVEVMVTGGVSSDNLKDWLAAGVDVLGIGGE 185 

Query: 212 LSSKVATEGYDSVTKIAKQFVSAL 235 

+ + + Y+ +TK A ++ +L 
Sbjct: 186 FNQLASQKQYNL1TKKAAHYIKSL 209 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 385 

A DNA sequence (GBSx0416) was identified in S.agalactiae <SEQ ID 1253> which encodes the amino 
acid sequence <SEQ ID 1254>. This protein is predicted to be pyruvate dehydrogenase complex repressor, 
Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2827 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certatnty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12044 GB:Z99105 similar to transcriptional regulator (GntR 
family) [Bacillus subtilis] 
Identities = 67/225 (29%), Positives = 119/225 (52%), Gaps = 17/225 (7%) 

Query: 3 RPLVEOTADFiDHLILEREYPVGAKLPNEYELAEDLDVGRSTIREAVRSDATRNILEVRQ 62 

+ L +Q +R++HL+ + G KLP E EL + L V R +REA+ SI T ++ + 
Sbjct: 16 KTLAKQVIERIVHLLSSGQLRAGDKLPTEMELMDlLHVSRPVIjREALSSLETLGVITRKT 75 

Query: 63 GSGTYISSKKGVSEDPLGFSLIKDTDRLTSDLFELRLIjLEPRIAELVAYRITDDQLQIjTjE 122 

GTY 4 K G+ P L TDL + + ER+LE + +A+I +++LQ L4 
Sbjct: 76 RGGTYFNDKIGM--QPFSVMIALATDNI1PA-IIEARMALELGLVTIAAEKINEEELQRLQ 132 



- -HAGDPKHLLLDVEFHSMLAK^SGNIMiIDSLLPVINQSIHLINANYTNR 180 
K + DI ++ H G+ D EFH ++A + N ++ ++ QS+ + +A ++ 
Sbjct: 133 KTIDDIANSTDNHYGE ADKEFHRIIALSANNPWEGMI QSLLITHAKIDSQ 183 

Query: 181 QMKBDSDEaHREIIKAIREKNPVAAHDA.VLI>IHIMSTORSALK 222 

+ + ++E H++I A+ +++P AH M H+ VR LK 
Sbjct: 184 IPYRERDVTVEYHKXIYDALAQRDPYKAHYHMYEmKFVRDKILK 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1255> which encodes the amino acid 
sequence <SEQ ID 1256>. Analysis of this protein sequence reveals the following: 
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r> N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2161 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 24/51 (47%) , Positives = 35/51 (68%) 

Query: 22 YPVGAKLPNEYErAEDLDVGRSTIREAVRSIATRNILEVRQGSGTYISSKK 72 

+P+G++LP+E LAE V R T+R+A+ h ILE R GSGTY++S + 
Sbjct: 30 WPIGSRLPSERHIAEHFTVSRMTLRQAITLLVEEGILERRIGSGTYV7ASHR 80 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 386 

A DNA sequence (GBSx0417) was identified in S.agalactiae <SEQ ID 1257> which encodes the amino 
acid sequence <SEQ ID 1258>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm — Certainty=0. 2178 (Affirmative) < succ: 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9701> which encodes amino acid sequence <SEQ ID 9702> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



MLYPLLTKTRNTYDLGGIWNFKLGEHNPN ELLPSDEVMVIPTSFNDLMVSKEK 75 

ML P+ T TR L G+W F L N L + +P SFND + 

MLRPVETPTREIKKLDGLWAFSLDRENCGIDQRWWESALQESRAIAVPGSFNDQFADADI 6 0 

RDYIGDFmrEKVIEVPKVSEDEEMVLRFGSVTHQAKIYVDGVLVGEHKGGFTPFEVLVPE 135 
R+Y G+ WY++ + +PK + +VLRF +VTH K++V+ V EH+GG+TPFE V 



f +++++C NN L++ T+P G I E+G KKK 



H DIT+ + ++D A + + VN +V+ + DD ++V 



Identities 




23 


Sbjct: 


1 




76 


Sbjct: 


61 


Query: 


136 


Sbjct: 


121 




196 


Sbjct: 


177 




254 


Sbjct: 


234 


Query : 


314 


Sbjct: 


293 




374 



QFL+N KPYFG 



L+ +GANS+RTSHYPY+EEM+ AD G++VIDE 



374 PAVGLFQNFNASLDLS EKDNGTWNLM- -QTKAAKEQAIQELVKRDKNHPSWMW 425 



WO 02/34771 



PCT/GB01/04789 



AVG EH SL + PK+ + + +T+ AH QAI+EL+ RDKNHPSWMW 

Sbjct: 353 AAVG FNLSLGIGFEAGNKPKELYSEZA^/NGETQQAHLQAIKELIARDKNHPSWMW 408 

Query: 426 WANEPASHEAGAHDYFEPLVIO,YI<DLDPQKRPVTLVNILMATPDRDQVMDLVDVVCLNR 485 

+ANEP + GA +YF PL + + LDP RP+T VN++ D + DL DV+CLNR 

Sbjct: 409 SIANEPDTRP0^3AREYFAPIiAEATRKLDPT-RPITCVH\WFCDAHTDTISDLFDVLCIJSIR 467 

Query: 486 YYGWYVDHGDLTNAEVGIRKELLEWQDKFPDKPIIITEYGADTLPGLHSTWNIPYTEEFQ 545 

YYGWYV GDL AE + KELL WQ+K +PIIITEYG DTL GLHS + ++EE+Q 
Sbjct: 468 YYGWWQSGDLETAEKVLEKELLAWQEKL-HQPIIITEYGVDTLAGLHSMYTD^1WSEEYQ 526 

Query: 546 CDFYEMSHRVFDGIPNLVGEQVWNFADFETNLMILRVQGNHKGLFSRNRQPKQWKEFKK 605 

C + +M HRVFD + +VGEQVWNFADF T+ ILRV GN KG+F+R+R+PK +K 
Sbjct: 527 CAWLDMYHRVFDRVSAWGEQVWNFADFATSQGILRVGGNKKGIFTRDRKPKSAAFLLQK 586 

Query: 606 RW 607 
RW 

Sbjct: 587 RW 588 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1259> which encodes the amino acid 
sequence <SEQ ID 1260>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.04 Transmembrane 1131 -1147 (1130 -1147) 



Final Results 

bacterial membrane Certainty=0 .2614 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco> 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suoo 

The protein has homology with the following sequences in the databases: 

>GP:AAF97242 GB:AF282987 beta-galactosidase precursor [Streptococcus pneumoniae] 
Identities = 303/921 (32%) , Positives = 463/921 (49%) , Gaps = 86/921 (9%) 





5 


Sbjct: 


96 


Query: 


59 


Sbjct: 






116 


Sbjct: 






175 


Sbjct: 


276 




235 


Sbjct: 


336 




293 


Sbjct: 


392 




351 


Sbjct: 


452 


Query: 


411 



? LY+L +Y GQ++D 



A R+L +K+MG N+IR-fTHNP+S + 



MV KN+P++ MWSIG 
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-491- 

Sbjct: 512 GGK--KPYDYGRFFEKD&THPEARKGEK WSDFDLRTMVERGKNNPAIFMWSIG 562 

Query: 471 MELMEGPSflDVSHYPELTRQMCQMITAIDTSRPITFGDNKIiKEflDFC-WHEEVSQM&TIiL 529 

NE+ G + +H +++ + 1 +D +R +T G +K + +■ HE+++ 
Sbjct: 5S3 NEI - -GEANGDAHSIATOKRLVKVIKDVDKTRYWMGADKFRFGNGSGGHEKIA 614 



Sbjct: 615 DELD AVGFNYSE-DNYKALRAKHPKWLIYGSETSSATRTRGSYYRPERELKHSNGP 669 

Query: 586 - -GYHLTSYDHAKVDWGAFASQAWYDTITRDFV- -AGECVWTGFDYLGEPTPWUKIDSGV 641 

Y 4- Y + +V WG A+ +W T RD AG+ +WTG DY+GEPTPW+ + 
Sb j at : 670 ERNYEQSDYGNDRVGWGKTATASW- - TFDRDNAGYAGQFIWTGTDYIGEPTPWHNQNQTP 727 

Query: 642 VGLWPSPKNAYFGILDTAGFPKDSYYFYQSQW-- 

V K++YFGI+DTAG PK +Y YQSQW 

Sbjct: 728 V KSSYFGIVDTAGIPKHDFYLYQSQWSVKKKPMVHLLPHWNWENKEIASKVAD 781 

Query: 695 EQGLVEVWYSNAASVQLMFEDEQGNLTDYGRKAFHTYSTPTGHTYQLYQGADAAKMPHE 754 

+G + V YSNA+SV+L N G K F+ T G TYQ +GA+A 
Sbjct: 782 SEGKIPVPAYSNASSVELFL NGKSLGLKTFNKKQTSDGRTYQ- -EGANA N 829 

Query: 755 NLYLTWRVPYQKGLLRAVRYDISGKSIPKTSGRSQVRTYGSVAKLSWKAFEAPIDAPW-E 813 

LYL W+V YQ G L A+A DSGKI R++TGA+ +IA + 

Sbjct: 830 ELYLEWKVAYQPGTLEAIARDESGKEI ARDKITTAGKPAAVRLIKEDHAIAADGKD 885 

Query: 814 LLYLDLSLLDSRGELVSHAQDWLQVQVEGPARLIALDNGNPTDHTPYQEP LRQAY 868 

L Y+ +4DS4G +V A + ++ Q+ G +L+ +DNG Y+ . +R+A+ 

Sbjct: 886 LTYIYYEIVDSQGNWPTANNLVRFQLHGQGQLVGVDNGEQASRERYKAQADGSWIRKAF 945 

Query: 869 GGKLLAILALTGEAGHIKVTA 889 

GK 4-AI + T +AG +TA 
Sbjct: 946 NGKGVAIVKSTEQAGKFTLTA 966 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 98/414 (23%) , Positives = 175/414 (41%) , Gaps = 64/414 (15%) 



LPHDFSLTQPYTRNGEAESAYKLGGVG- -WYRHYLVLDEVLAGCHVAITFEGSYMETEIY 143 



V+G +G+H G+ F 





54 


Sbj Ct: 


86 




114 


Sbj ct : 


144 




174 


Sbjct: 


180 




228 


Sbjct: 


239 




268 


Sbjct: 


299 


Query: 


327 


Sbjct: 


359 


Query: 


384 


Sbjct: 


419 



■ +G++-R + L + P-t- H 



L LLKDMGAN+ R++H P S ++ +LA+R+G VI+E 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 387 

A DNA sequence (GBSx0418) was identified in S.agalactiae <SEQ ID 1261> which encodes the amino 
acid sequence <SEQ ID 1262>. This protein is predicted to be 2-keto-3-deoxygluconate kinase. Analysis of 
this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 197 - 213 t 197 - 213) 



Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9699> which encodes amino acid sequence <SEQ ID 970O 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 14 KIISLGEVIiRLSPPQYHTLMQANHLKCQFGGSELNVLASLAQLGyHVGLVSALPDNDLG 73 

K+++ GE++LRLSPP + + Q + +GG+E NV A LAQ+G V+ LP+N LG 

Sbjct: 2 KVVTFGEIMLRLSPPDHKRIFO/TDSFDVTYGGAEANVAAPIAO^GLDAYFVTKLPNNPLG 61 

Query: 74 KMASQFILSQQISPAAIIKKEGRLGIYYyEQGFSVRTNKVIYDRNYSSFWESTLSDYDFT 133 

A+ 4 + I + R+GIY+ E G S R +KV+YDR +S+ E+ D+D+ 
Sbjct: 62 DAaaGHLRKFGVKTDYIARGGNRIGIYFLEIGASQRPSKVVYDRAHSAIEEAKREDFDWE 121 



Sbjct 
Query 
Sbjct 

Sbj ct 
Query: 
Sbjct 



134 SIFKGVDWFHVSGITPALTKDLYEVTRFLMTKAKEGGVKVSIDLNFRESLWSSFQEAREQ 193 

I G WFH SGITP L K+L + + A E GV VS DLN+R LW+ +EA++ 
122 KILDGARWFHFSGITPPLGKELPLILEDALKVAIffiKGVTVSCDIjNYRARLWTK-EEAQKV 180 

194 LSPLLGLLDVCFGLEPIYLAGESEDLKDELGLSRPYLDI ELLEKITQKIVQEY 24 S 

+ P + +DV L ED++ LG+S LD+ E KI +++ ++Y 

181 MIPFMEYVDV LIANEEDIEKVLGI SVEGLDLKTGKLNREAYAKIAEEVTRKY 232 

247 GLDYIAFTQREMEYTNQYMLKSYLYHNNMLYQTDKTGVEVLDRVGTGDAFAAGLIHALLE 3 OS 

+ T RE ++ N + +++ + ++DRVG GD+FA LI+ L 

233 NFKTVGITLRESISATVNYWSVMVFENGQPHFSKRYEIHIVDRVGAGDSFAGALIYGSLM 292 

307 KETPQRALEIAMATFKYKHTIQGDINIMTRDDIAYLIEKSTN 348 

Q+ E A A KHTI GD +++ ++I L T+ 
293 GFDSQKKAEFAAAASCLKHTIPGDFVVLSIEEIEKLASGATS 334 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1263> which encodes the amino acid 
sequence <SEQ ID 1264>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 . 0708 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/319 (34%) , Positives = 168/319 (51%) , Gaps = 7/319 (2%) 
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Query: 


12 


Sbjct: 




Query: 


72 


Sbjct: 


74 


Query: 


132 


Sbjct: 


134 


Query: 
Sbjct: 


192 


Sb j ct : 


252 


Query: 


312 


Sb j Ct : 


307 



FGGSE+N+ +L 



R+G+YY E GF R ++V YDR SSF 



+ IF+G+ FH SGI+ AL K 



R L+ A+AT K T+ D 
RNLDFAVATASLKCTVAED 325 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 388 

A DNA sequence (GBSx0419) was identified in S.agalactiae <SEQ ID 1265> which encodes the amino 
acid sequence <SEQ ID I266>. Analysis of this protein sequence reveals the following: 
Possible site: 15 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.17 Transmembrane 5 - 21 ( 5-21) 

Final Results 

bacterial membrane Certainty=0 . 1468 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 389 

A DNA sequence (GBSx0420) was identified in S.agalactiae <SEQ ID 1267> which encodes the amino 
acid sequence <SEQ ID 1268>. Analysis of this protein sequence reveals the following: 



3 N-terminal signal sequence 



INTEGRAL Likelihood 
INTEGRAL Likelihood =-11 
INTEGRAL Likelihood = -9 
INTEGRAL Likelihood = -7 
Likelihood = -4 
Likelihood = -4 
Likelihood = -3 
Likelihood = -1 



Transmembrane 198 - 214 ( 191 - 

- 462 ( 437 - 

- 110 ( 91 - 

- 307 ( 283 - 

- 281 ( 257 - 282) 

- 337 ( 318 - 

- 422 ( 405 - 

- 137 ( 121 - 




WO 02/34771 PCT/GB01/04789 
-494- 

Transmembrane 



Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) < succ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 36 IYLFTFMFVTYFSTGVLGSAAIFVSQIMGYIRIFDGFIDPA1GIMIDKTDTKFGKYRPIL 95 
IY ++ +F T V G +A + +RI D DP IG ++D+T+++F ++RP h 

15 Sbjct: 27 lYATVSTYLLFFYTDVFGLSAaAAGTMFLWRilDAIADPFIGTIVDRTNSRFARFRPYL 86 

Query: 96 IIGNVITALSLIFLLALRGVDENIRFPLFILVLIIHKIGYEMQQTITKAGQTALTNDPKQ 155 

+ G A+LL + ++I+GS+T ALT+ 

Sbjct: 87 LFG AFPFVILAILCFTTPDFSDMGKLIYAYITYVGLSLTYTTINVPYGALTS-AMT 141 

20 

Query: 156 RPIFNIvDAVMTTSLMTGGQFWSVFLVPKFGNFTPQFFNVLIFGTILISAILAIV- -AI 213 

R +V L +V F VP + G L IL ++ + 

Sbjct: 142 RNNQEWSITSVRMLFANLGGLVmFFVPLIAAYLSOTSGNESLGWQLTMGILGMIGGCL 201 

25 Query: 214 IGIWAKDRKEFFGLGENTQKTALKDYWKVLKGNKPLQILSIAAALVKFA1QFFGDSV-VM 272 

+ K KE L ++ +K D ++ + N+PL +LSI ++ F + +SV + 
Sbjct: 202 LIFCFKSTKERVTLQKSEEKIKFTDIFEQFRVNRPLWLSIFFIII-FGVNSISNSVGIY 260 

Query: 273 VLLFGI LFGNYALSGQFSLLFIVPGVIINILFSTIARKKGLRFSYVRAIQIGMIGL 328 

30 + + +LYLGL I+P I L + +KK L + A+ + +IGL 

Sbjct: 261 YVTYNLEREDLVKWYGLIGSLPALVILP--FIPRLHQFLGKKKLLNY ALLLNIIGL 314 

Query: 329 IAFGAA/LYVGKPGDLSLTSljNLYTILFIVTNIIARYASQAPASIiVLTMGADISDYETSES 388 
LA L + N+Y IL V +IA S + + + +Y +' 

35 Sbjct: 315 LAL LFVPPSNVYLIL - -VCRLIAAAGSLTAGGYMWALIPETIEYGEYRT 361 

Query: 389 GRYVSGMIGTIFSLTDSIASSFAPMWGFVLAGIGFSKSFPTIETPLPPDLKMAAISILV 448 

G+ + G+I I + 4V G VL G+ P M + 

Sbjct: 362 GKRMGGLI YAI IGFFFKFGMALGGWPGLVLDKFGY VANQAQTPAALMGILITTT 416 

40 

Query: 449 AIPFIALSIALLLMKFYKLDKEEMVRIQEKIQ 480 

IP L +AL+ + FY LD+++ + +++ 
Sbjct: 417 IIPVFLLVLALIDINFYNLDEKKYKNMVRELE 448 

45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 390 

A DNA sequence (GBSx0422) was identified in S.agalactiae <SEQ ID 1269> which encodes the amino 
50 acid sequence <SEQ ID 1270>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 3375 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — - Certainty=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



WO 02/34771 



-495- 
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>GP:AAB17663 GB.-U31175 D-specific D-2 -hydroxyacid dehydrogenase [S. aureus] 
Identities = 165/331 (49%) , Positives = 231/331 (68%) , Gaps = 1/331 (0%) 

MMIOjKVFI^EEEATIAQDWANRM^ 5 0 

M K+ F R+ E +A +W +M+VE++ S+ L+ TV++++ +DG+ Q L++ 
MTKIMFFGTRDYEKEMAI^GKKKm^OTTSKELLSSAr^QLKDYDGTOTMQPGKLEND 60 



+YP L+ GIKQIAQR+AG DMY+L+LAK+H I + ISNVPSYSPE+IAE++V+IAL L+R+ 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbj ct : 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


240 


Query: 


301 


Sb j ct : 


300 



r NMTVAI IGTGRIG ATAKI+ GFG 



D L Y +SV+EA+++AD++SLH+P E+ HLF+ MF KKGAIL+N ARGA++ 



T DL+ A++ G 1 ffl IDTYE E Y + +DI H L LI i. 



+ +DEAV+NL VEG LNA 4- VI TGT T++N 
FSDEAVQNLVEGGLNAALSVINTGTCETRLN 330 

There is also homology to SEQ ID 124. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 391 

A DNA sequence (GBSx0423) was identified in S.agalactiae <SEQ ID 1271> which encodes the amino 
acid sequence <SEQ ID 1272>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 



• Final Results 

bacterial cytoplasm Certainty=0 . 23 64 (Af f irmati\ 

bacterial membrane Certainty=0 . 0000 (Wot Clear) 

bacterial outside Certainty=0 . 0000 (Not Clear) 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 392 

A DNA sequence (GBSx0424) was identified in S.agalactiae <SEQ ID 1273> which encodes the amino 
acid sequence <SEQ ID 1274>. This protein is predicted to be regulatory protein (pfoS/R). Analysis of this 
protein sequence reveals the following: 
Possible site: 37 

»> Seems to have a cleavable N-term signal seq. 

IMTEGRAL Likelihood =-12.90 Transmembrane 64 - 80 ( 53 - 89) 



WO 02/34771 
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bacterial membrane Certainty=0. 6158 (Affirmative) < succ? 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9325> which encodes amino acid sequence <SEQ ID 9326> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

Identities = 33/91 (36%) , Positives = 55/91 (60%) , Gaps = 1/91 (1%) 

Query: 1 MAmHAKPiaMLPMISSAAILGILGALFNIQGTPASAGFGISGLIGPlNALNLAKGGWSV 60 

M N + P + +P++ + + G+L LFN+QGTPASAGFG GL+GPINA L V 
Sbjct: 250 MPMVXRyPII^IPLLI^GLVCGVIAWLI^C^TPASAGFGFIGLVGPINAYKLMAYTPMV 309 

Query: 61 MNMLLIIIIFVAAPIILNFIFNYLFIJOTLKI 91 

+L 4+ FV + + ++ 4-++ + LK+ 
Sbjct: 310 RAGILFLWFVLS-FLAAYLIDFILVDRLKL 339 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1275> which encodes the amino acid 

sequence <SEQ ID 1276>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.31 Transmembrane 141 - 157 ( 133 - 166) 
INTEGRAL Likelihood = -6.00 Transmembrane 92 - 108 ( 88 - 112) 

Final Results 

bacterial membrane — Certainty=0 . 5925 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

Identities = 63/178 (35%) , Positives = 107/178 (59%) , Gaps = 10/178 (5%) 

Query: 2 IGQGIASLLGLQPILMSLLIAMIFCFLIVSPITTVGIALAINLSGIGSGAASFG 55 

+G+ IA+ 4 LQP+LMS+L++M F +T+SP+++V + +A+ L+G+ SGAA+ G 
Sbjct: 164 VGRVIATFIALQPLLMSILLSMSFSLIIISPVSSVAVGIAVGLTGIASGAANIGVSSCAM 223 

Query: 56 - LCLAGWAVNSKGTSLAHVLRSPKI SMANVLS KPKI MLPMLCSAAVLGVTGAI FNIQGTP 114 

L+ VH G LA +K+MN+ P + +P+L + V GV+ +FN+QGTP 
Sbjct: 224 TLIVGTMRVNKIGVPLAMFAGAMIO^PNWIRYPII^IPLLI^GLVCGVIAWLENLQGTP 283 

Query: 115 ASAGFGISGLIGPINALTO^AKGGWCP-WILLIIIIFVGAPIVLNMIFNYLF 171 

ASAGFG GL+GPINA L + P V ++ +++ + + +++ + LK+ 

Sbjct: 284 ASAGFGFIGLVGPINAYRLM--AYTPMVR&GILFLWFVLSFLAAYLIDFILVDRLKL 339 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 86/101 (85%) , Positives = 96/101 (94%) 

Query: 1 MANVLAKPKIMLPMISSAAILGILGALFNIQGTPASAGFGISGLIGPINALNLAKGGWSV 60 

MANVL+KPKIMLPM+ SAA+LG++GA+FNIQGTPASAGFGISGLIGPINALNLAKGGW 
Sbjct: 81 MAN^SKPKIMLPMLCSAAVLGVIGAIFl'IICGTPASAGFGISGLIGPINALNLAKGGWCP 140 

Query: 51 MNMLLIIIIFVJmPIIMFIFI^LFIKvLKIIDPMDYKLDI 101 

+N+LLIIIIFV API+LN IFNYLFIKVLK+IDPMDYKLDI 
Sbjct: 141 WILLI III WGAPIVLNMIFNYLFIKVLKVIDPMDYKLDI 181 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 393 

A DNA sequence (GBSx0426) was identified in S.agalactiae <SEQ ID 1277> which encodes the amino 
acid sequence <SEQ ID 1278>. This protein is predicted to be regulatory protein (pfoS/R). Analysis of this 
protein sequence reveals the following: 



Possible site: 48 

»> Seems to have no N-terminal 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = -1.33 



Lgr.al sequence 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 3633 (Affirmative) . 
- Certainty=0. 0000 (Not Clear) < f 

■ Certainty=0. 0000 (Not Clear) < 



20 A related GBS nucleic acid sequence <SEQ ID 9735> which encodes amino acid sequence <SEQ ID 9736> 
was also identified. 

A related GBS nucleic acid sequence <SEQ ID 9691> which encodes amino acid sequence <SEQ ID 9698> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

25 >GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 

pallidum] 

Identities = 61/158 (38%) , Positives = 92/158 (57%) 

Query: 24 KSFIMNVMGLALGTVIVLIPGAILGELMKALLPI«fSGFATLIAATAVATSMMGLVIGIM 83 
30 + F+M +LNG + G VI L+P AI GEL +AL P+ FA L + +IG + 

Sbjct: 9 RQFMMKILNGSSAGIVIGLVPPAIAGELFRALAPLSPLFAALYHWLPIQFSVPALIGTL 68 

Query: 84 VGLNFKFNPIQSASLGLAVMFAGGA&TFLKGAIMLKGTGDIINMGITAALGVLLIQFLSD 143 
VGL F + + A+L + A G T GA ++ G GD+IN+ + +AL ++L++ L 
35 Sbjct: 69 VGLQFHCSAPEVATLAFVSVIASGNvTLQNGAl'fLITGIGDVINVMLISALAIILVRALRG 128 

Query: 144 KTKSFTLIVIPTVTLLLVGGVGHVLLPYVKMITTMIGQ 181 

K S T+I +P + ++ GGVG LPYVKMIT +G+ 
Sbjct: 129 KLGSLTIIALPVIVAWAGGVGSFSLPYVKMITLFVGR 166 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1279> which encodes the amino acid 

sequence <SEQ ID 1280>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>>> Seems to have no N-terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-13 
Likelihood =- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



178 - 215) 



■ Final Results - 

bacterial r 
bacterial outside - 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

Identities = 137/346 (39%) , Positives = 217/346 (62%) , Gaps = 14/346 (4%) 

FMNK^GTAIAIWALIPNAIIATFLKPLLP-NMAAAEFLHIVQVFQFFTPIMAGPLIG 70 
FM K+L G++ IV+ L+P AI + L P + A H+V QF P + G L+G 





Query: 


12 


10 


Sbjct: 


11 








15 


Sbjct: 


71 




Query: 


131 




Sb j ct : 


123 


20 


Query: 


191 




Sbjct: 


183 


25 


Sbjct: 


251 




Query: 


309 


30 


Sbjct: 


303 



K GSLTII LP+ + G +G LPYV +T +G+ I +F LQP+LMSIL 



++4-+FSLII+SP+S+VA+G+A+GL G+A+GAA++G++S A L+ T++VNK GVP+A+ 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 65/172 (37%), Positives = 95/172 (54%), Gaps - 9/172 (5%) 

Query: 19 EKQTTKSFIM^^VI^GLALGTVIVlIPGAIIi3ELMKALLPMWSGFATLIA^TAVATS^#IGL 78 

+K+T SF+ VL G A+ V+ LIP AIL +K LLP + A + V + 
Sbjct: 5 DKETFSSF]^KWLAGTAIAIVVALIPNAILATFLKPLLPNMAA-ABFLHIVQVFQFFTPI 63 

Query: 79 VIGIMVGLNFKFNPIQSASLGLAVMFAGGAaTFLK GAIMLKGTGDIINMGIT 130 

+ G ++G FKFNP+Q +4G A GA + 4 G L+G GD+INM IT 

Sbjct: 64 MAGFLIGQQFKFNPMQQLAVGGAAYIGSGAWAYTEVIQKGVATGTFQLRGIGDLINMMIT 123 

Query: 131 AALGVLLIQFLSDKTKSFTLIVIPTVTLLLVGGVGHVLLPYVKMITTMIGQG 182 

A4-L VL 4+4 +K S T+I++P VG +G LPYV +TT+IGQG 

Sbjct: 124 ASLAVLAVKYFGKKFGSLTIILLPITIGTGVGYIGWKFLPYVSYVTTLIGQG 175 

A related GBS gene <SEQ ID 8567> and protein <SEQ ID 8568> were also identified. Analysis c 

protein sequence reveals the following: 

Lipop: Possible site.- -1 Crend: 10 
McG: Discrim Score: -13.49 
GvH: Signal Score (-7.5): -5.82 

Possible site: 48 
>» Seems to have no N-terminal signal sequence 



Likelihood = - 

Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 
PERIPHERAL Likelihood = 
modified ALOM score: 1.82 

*** Reasoning Step: 3 



i threshold: 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
51 



124 - 140 
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Final Results 

bacterial membrane --- Certaiaty=0 .3633 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the databases: 

ORF01226(352 - 843 of 1218) 

EGAD)138195)tP0038 (3 - 166 of 350) regulatory protein {Treponema pallidum] OMNI]TP0038 
regulatory protein (pfoS/R) GP| 332229B I gb | AAC65034 . 1 1 | AE001189 regulatory protein (pfoS/R) 
10 {Treponema pallidum} PIR|E71373|E71373 probable regulatory protein (pfoS/R) - syphilis 

spirochete 
%Match =13.6 

%Identity =37.2 %Similarity =59.1 

Matches = 61 Mismatches = 67 Conservative Sub.s = 36 

15 

273 303 333 363 393 423 453 483 

I*FFPIFLLQIAMI*LI*LVKSQTIIISRRHLM£DWEKQTTKSFIM^ 

: : = hi :||| = I II hi II III =11 h 
MHTQSLSPRQF^IKILNGSSAGIVIGLVPPAIAGELFRALAPLSPL 
20 10 20 30 40 

513 543 573 603 633 663 693 723 

FATLIAATAVATS^MGLVIGIMVGLNFKFNPIQSASLGLAVMFAGGAATFLKGAIMLKGTGDIINMGITAALGVLLIQFL 

II I = >|| :||l I = = hi : : I I I : II :: I Ihlh = :|| ::|»: I 

25 FAALYHWLPIQFSVPALIGTLVGLQFHCSAPEVATIAFVSVIASGNvTLQNGAWLITGIGDVINVMLISAIAIILvRAL 
60 70 80 90 100 110 120 

753 783 813 843 873 903 933 963 

SDKTKSFrLIVIPTVTIiLLVGGVGHVI.LPWKKITrMIGQGTRRTHENFLFILLCPDINFEKIPF*INDLLSLFLQIIGL 

30 | |:|:| :| : III I Mllllll =1 = 

RGKLGSLTIIALPVIVAWAGGVGSFSLPyVKMITLFVGRVIAT?IAI.QPLLMSILLSMSFSLIIISPVSSVAVGIAVGL 
140 150 160 170 180 190 200 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 394 

A DNA sequence (GBSx0428) was identified in S.agalactiae <SEQ ID 1281> which encodes the amino 

acid sequence <SEQ ID 1282>. This protein is predicted to be cyn operon transcriptional activator. Analysis 

of this protein sequence reveals the following: 

40 Possible site: 15 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15857 GB:Z99123 alternate gene name: ipa-24d~similar to 
50 transcriptional regulator (LysR family) [Bacillus subtilis] 

Identities = 87/282 (30%), Positives = 152/282 (53%), Gaps = 5/282 (1%) 

Query: 1 MDIRQLTYFIAVAFAKNYSRMUCSLFVTQPTLSQ 60 
MDIR LTYF+ VA K++++A++SL+V+QPT+S+ IK LE EL LF +NGRQ+ LT+A 
55 Sbjot: 1 ^IRELTyFLEVARLKSFTKASQSLyvSQETISKMIKNLEEELGIELFYRNGRQVELTnA 60 



60 



Query: 61 geilyekgqllmtnvnqmvteiqqlnqekkegirvgltslfaiqfmkqi-stfmathsnv 119 

G +Y + Q ++ + + +E+ + + KK +R+GL + F ++ F + NV 
Sbjct: 61 GHSMWQAQEIIKSFQNLTSELNDIMEVKKGHVRIGLPPMIGSGFFPRVLGDFRENYPNV 120 
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Query: 120 EVSLIQDGSRKLQELLAKGKIDIGLLSFPSTRMDITIEPLQTSTKGYKVSIVMPKSHPLA 179 

L++DGS K4QE + G +DIG++ P+ + + T + +V+ SH LA 
Sbjct: 121 TFQLVEDGSIKVQEGVGDGSLDIGWVLPANEDIFHSFTIVKET LMLWHPSHRLA 176 

5 Query: 180 ThPEIEIMDhRDYKVASl^HmLGEMLPRKCILALgFDPHIVFKimimVhlHShQDmA 239 

E +L +L+D E ++L + +C GF PH1+++ 4- W+ + + 

Sbjct: 177 DEKECQLRELKDEPFIFFREDFVLH1TOIMTECIKRGFRPHIIYETSQWDFISEMVSANLG 235 

Query: 240 VTILPSEFESISQVQDLCWVPLKDKNNFYPIGIAYRNOTSFS 281 
10 + +LP + + +PL D + + I +R D S 

Sbjct: 237 IGLLPERI CRGLDPEKVKVI PLVDPVI PWHLAI IWRKDRYLS 278 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1283> which encodes the amino acid 
sequence <SEQ ID 1284>. Analysis of this protein sequence reveals the following: 

15 Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1101 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
Identities = 125/160 (78%) , Positives = 144/160 (89%) 

Query: 135 I^GKIDIGLLSFPSTRNDITIEPLQTSTKGYKVSIVMPKSHPLATLPEIELNDLRDYKV 194 

L++GKIDIGLLSF S R DITIE LQTSTKGYKVS IV+ K HPLA P+++L DL+ YK+ 
Sbjct: 1 LSQGKIDIGLLSFLSIRKDITIEIiQTSTKBYKVSIVLLKQHPLAQHPQLKLKDLKGYKI 60 

Query: 195 ASIOTHYMLGEMLPRKCRALGFDPHIVFKHNDWEVLIHSLQDLNAVTILPSEFESISQVQ 254 

ASLN+HYMLGEMLPRKCRALGF+P IVFKHNDWElTiIHSL DT.N +TILPS+FES++QV 
Sbjct: 61 ASITOHYMLGEMLPRKCRALGFEPDIWKHNDWFATLIHSLHDLNTLTILPSDFESLNQVD 120 

Query: 255 DLCWVPLKDKNNFYPIGIAYRNDTSFSPMIEEFLSLLKTN 294 

+L W+PL+DKNNFYPIGIAYR+D SFSP+IEEFLSLLKTN 
Sbjct: 121 NLWIPLQDKNNFYPIGIAYRDDASFSPVIEEFLSLLKTN ISO 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 395 

A DNA sequence (GBSx0429) was identified in S.agalactiae <SEQ ID 1285> which encodes the amino 
acid sequence <SEQ ID 1286>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have- no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1833 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

Signal peptide: 1-21 

A related GBS nucleic acid sequence <SEQ ID 8569> which encodes amino acid sequence <SEQ ID 8570> 
was also identified. 



55 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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SEQ ID 8570 (GBS271) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 8; MW 31.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 55 (lane 6; MW 56.3kDa) and in Figure 
62 (lane 10; MW 56.3kDa). 

GBS271-GST was purified as shown in Figure 210, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 396 

A DNA sequence (GBSx0430) was identified in S.agalactiae <SEQ ID 1287> which encodes the amino 
acid sequence <SEQ ID 1288>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.74 Transmembrane 9 - 25 ( 5 - 28) 
Likelihood = -5.84 Transmembrane 97 - 113 ( 92 - 122) 
Likelihood = -5.47 Transmembrane 37 - 53 ( 35 - 61) 
Likelihood = -2.55 Transmembrane 220 - 236 ( 220 - 238) 
INTEGRAL Likelihood = -1.65 Transmembrane 64 - 80 ( 63 - 81) 
INTEGRAL Likelihood = -1.28 Transmembrane 193 - 209 ( 192 - 209) 
INTEGRAL Likelihood = -0.53 Transmembrane 125 - 141 ( 125 - 141) 

Pinal Results 

bacterial membrane Certainty=0. 3697 (Affirmative) < sueo 

bacterial outside Certainty* 0.00 00 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC73593 GB : AE000155 putative metal resistance protein 
[Escherichia coli K12] 
Identities = 128/252 (50%) , Positives = 186/252 (73%) 

Query: 5 NSISLMSLLMASSLVLITLFFSYWQKLNLEKEVIISAIRAVIQLLAVGFLLDYIFGYQNP 64 

++I+ SL +A LV++ + S+ +KL LEK+++ S RA+IQL+ VG++L YIF + 
Sbjct: 13 HNITNESILAIoALMLVWAILISHKEKl^ALEKDILWSVGRAIIQLIIVGYVLBCYIFSVDDA 72 

Query: 65 IFTALLMLFMIINASYNAAKRGKGINKGFVISFIAIGSGTIITLSVLIFSGILKFVPNQM 124 

T L++LF+ NA++NA KR K I K F+ SFIAI G ITL+VLI SG ++F+P Q+ 
Sbjct: 73 SLTLLMVLFICFNAAWNAQKRSKYIAKA7ISSFIAITVGAGITLAVLILSGSIEFIPMQV 132 

Query: 125 IPVGGMIISNSMVAIGLCYKQLLSEFRSKQEEVETKLALGADILPASIDIIRDVIKTGMV 184 

IP+ GMI N+MVA+GLCY h S+Q++++ KL+LGA AS +IRD 1+ ++ 

Sbjct: 133 IPIAGMIAGNAMVAVGLCYNNLGQRVISEQQQIQEKLSLGATPKQASAILIRDSIRAALI 192 

Query: 185 PTIDSAKTLGIVSLPGMMTGLILAGTSPIQAVKYQMMVTFMLLATTSIASFVATYLAYKI 244 

PT+DSAKT+G+VSLPGMM+GLI AG P++A+KYQ+MVTFMLL+T S+++ +A YL .Y+ 
Sbjct: 193 PTVDSAKTVGLVSLPGMMSGLIFAGIDPVKAIKYQIMVTFMLLSTASLSTIIACYLTYRK 252 

Query: 245 FFNNRKQLWTK 256 

F+N+R QLWT+ 
Sbjct: 253 FYNSRHQLWTQ 264 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 397 

A DNA sequence (GBSx0431) was identified in S.agalactiae <SEQ ID 1289> which encodes the amino 
acid sequence <SEQ ID 1290>. This protein is predicted to be SUGAR TRANSPORT ATP-BINDING 
PROTEIN. (b0490). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1903 (Affirmative) < suco 

bacterial membrane Certain-y=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC73592 GB:AE000155 putative ATP-binding component of a 
transport system [Escherichia coli K12] 
Identities = 95/202 (47%), Positives = 142/202 (70%), Gaps = 2/202 (0%) 

Query: 4 

Sbjct: 8 

Query: 64 GKDIMQLEPIESRKMISYCFQTPHLFGNTVEDNISFPYHIRHEKvDYRRVDDLFQRFEMD 123 

G+D++ L+P R+ +SYC QTP LFG+TV DN+ FP+ IR+ + D D +RF + 

Sbjct: 68 GEDVSTLKPEIYRC2QVSYCAQTPTLFGDTVYDNLIFPWQIRNRQPDPAIFLDFLERFALP 127 

Query: 124 QSYLKQDVKKLSGGEKQRIALIRQLLFEPKvLLLDEVTSALDNHNKAIVEKVI-KSLHDK 182 

S L ++■+ +LSGGEKQRI+LIR L F PKVLLLDE+TSALD NIC V ++I + + ++ 
Sbjct: 128 DSILTKNIAELSGGEKQRISLIRHLQFMPKVLLLDEITSALDESNKHNVNEMIHRYVREQ 187 

Query: 183 GITILWITHDEEQSRRFANKVL 204 

I +-LW+THD+++ A4-KV+ 
Sbjct: 188 NIAVLWVTHDKDEINH-ADKVI 208 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1291> which encodes the amino acid 
sequence <SEQ ID 1292>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2053 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 73/214 (34%) , Positives = 133/214 (62%) , Gaps = 9/214 (4%) 







Sbjct: 


6 




62 


Sbjct: 


66 


Query: 


119 


Sbjct: 


124 




179 


Sbjct: 


183 



I L PI R I FQ LF + TV +N++F ++ +K +RV + + 
Sbjct: 66 LDGERINDL-PINKRD-IHTVFQ^^YALFPH^m/FENVAFALKLKKVDKKEIAKRVKETLK 123 



v ++KLSGG++QR+A+ R ++ +P+V+LLDE 



\- GIT +++THD4-E++ ++ 4 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 398 

A DNA sequence (GBSx0432) was identified in S.agalactiae <SEQ ID 1293> which encodes the amino 
acid sequence <SEQ ID 1294>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0658 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 399 

A DNA sequence (GBSx0434) was identified in S.agalactiae <SEQ ID 1295> which encodes the amino 
acid sequence <SEQ ID 1296>. This protein is predicted to be deda protein (dedA). Analysis of this protein 
sequence reveals the following: 

Possible site: 53 

■>■» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.05 Transmembrane 186 - 202 ( 178 - 208) 

INTEGRAL Likelihood = -8.81 Transmembrane 65 - 81 ( 61 - 89) 

INTEGRAL Likelihood = -7.54 Transmembrane 26 - 42 ( 24 - 47) 

INTEGRAL Likelihood = -0.37 Transmembrane 152 - 168 ( 152 - 168) 



30 Final Results 

bacterial membrane --- Certainty=0 . 5819 (Affirmative) < succ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database: 



Query: 2 FLIDFILHIDTHIYAMANTVGNWTYLLLFLVIFVT3TGAVIFPFLPGDSLLFAAGALAANP 61 

FLIDFILHID H+ + G W Y +LFL++F ETG V+ PFLPGDSLLF AGALA+ 
Sbjct: 6 FLIDFILHIDVHLAELVAEYGVWVYAILFLILFCETGLVVTPFLPGDSLLFVAGALASLE 65 

Query: 62 KMSFNIVTFLIIFFIAAFIGDSCNFLIGRTFGYRFIKHP-- -FFRRFIKEKNIRDAELYF 118 

N+ +++ IAA +GD+ N+ IGR FG + +P FRR +K ++ 
Sbjct: 66 TMDLNVHI*IVvLMLIAAIVGDAVNYTIGRLFGEKLFSNPNSKIFRRSYLDK THQFY 121 

Query: 119 EKKGTAAIILGRYIPIIRTFVPFVAGISQLPPKVFIKRAFIAALSWSVIATGSGFLFGNI 178 

EK G IIL R++PI+RTF PFVAG+ + +F IALW++T +G+ FG I 

Sbjct: 122 EKHGGKTIILARFVPIWTFAPFVAGMGHMSYRHFAAYNVIGALLWVLLFTYAGYFFGTI 181 



Query: 179 PFVKQHFSLI ILGIVFVTLIP VLISGVKSYR 209 

p V+ + L+I+GI+ V+++P +1 ++ R 
Sbjct: 182 PMVQDNLKLLIVGIIWSILPGVIEIIRHKR 212 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 400 

A DNA sequence (GBSx0435) was identified in S.agalactiae <SEQ ID 1297> which encodes the amino 
acid sequence <SEQ ID 1298>. Analysis of this protein sequence reveals the following: 

a N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3100 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 401 

A DNA sequence (GBSx0436) was identified in S.agalactiae <SEQ ID 1299> which encodes the amino 
acid sequence <SEQ ID 1300>. This protein is predicted to be DNA-entry nuclease. Analysis of this protein 
sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3990 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9323> which encodes amino acid sequence <SEQ ID 9324> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA38134 GB:X54225 membrane nuclease [Streptococcus pneumoniae] 
Identities = 87/157 (55%), Positives = 110/157 (69%), Gaps = 1/157 (0%) 

Query: 1 MLDRTIRQYQNRRDTTLPDANWKPLGWHQVAT-NDHYGHAVDKGHLIAYALAGNFKGWDA 59 

+L + RQY+NR++T +W P GWHQV Y HAVD+GHL+ YAL G G+DA 

Sbjct: 116 LLSKATRQYKmKETGNGSTSOTPPGraQVKNLKGSYTHAVDRGHLLGYALIGGLDGFDA 175 

Query: 60 SVSNPQNWTQTAHSNQSNQraiNRGQNYYESLVRKAVDQNKRVRYRvTPLYRNDTDLVPF 119 

S SNP+N+ QTA +NQ+ 4- + GQNYYES VRKA+DQNKRVRYRVT Y ++ DLVP 
Sbjct: 176 STSNPKNIAVQTAWANQAQAEYSTGQNYYESK77RKALDQNKRVRYRVTLYYASNEDLVPS 235 

Query: 120 AfflLEftKSQDGTLEFNVAIPNTQASYTMDYATGEITL 156 

A +FAKS DG LEFNV +PN Q +DY TGE+T+ 
Sbjct: 236 ASQIEftKSSDGEDEFNVLVPNVQKGLQLDYRTGEVTV 272 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1301> which encodes the amino acid 
sequence <SEQ ID 1302>. Analysis of this protein sequence reveals the following: 
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Possible site: 42 
»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA38134 GB:X54225 membrane nuclease [Streptococcus pneumoniae] 
Identities = 89/135 (65%) , Positives = 104/135 (76%) , Gaps = 1/135 (0%) 

Query: 25 SPAGWHRLHHLKGSYDHAVDRGHLLGYALVGGLKGFDASTGNPDNIATQLSWANQANKPY 84 

+P GWH++ +LKGSY HAVDRGHLLGYAL+GGD GFDAST NP NIA Q +WANQA Y 
Sbjct: 138 TPPGWHQVKNLKGSYTHAVDRGHLLGYALIGGLDGFDASTSNPKNIAVQTAWANQAQAEY 197 

Query: 85 LTGQNYYEGLVRRALDKGHRVRYRVTLLY-DGDNLLASGSHLEAKSSDDSLTFNVFVPNV 143 

TGQNYYE VR+ALD+ RVRYRVTL Y ++L+ S S +EAKSSD h FNV VPNV 
Sbjct: 198 STGQNYYESKVRKALDQNKRVRYRVTLYYASN3DLVPSASQIEAKSSDGELEFNVLVPNV 257 

Query: 144 QAGLTADYRTGQIAI 158 

Q GL DYRTG++ + 
Sbjct: 258 QKGLQLDYRTGEVTV 272 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 73/135 (54%) , Positives = 92/135 (68%) , Gaps = 2/135 (1%) 

Query: 24 PLGWHQVA-TOTJHYGHAVDKGHLIAYALAGNFKGWDASVSNPQIWvTQTAHSNQSNQKIN 82 

P GWH++ Y HAVD+GHL+ YAL G KG+DAS HP N+ TQ + +NQ+N+ 

Sbjct: 26 PAGWHRLHHLKGSYDHAVDRGHI^YALVGGLKGFDASTGNPDNIATQLSWANQAKKPYL 85 

Query: 83 RGQNYYESLWKAVDQNKKVRYRVTPLYRNDTDLVPFAMHLESUCSQDGTIiEFlIVAIPKTQ 142 

GQNYYE LVR+A+D+ RVRYRVT IiY D +L+ HLEAKS D +L FNV +PN Q 
Sbjct: 86 TGQl^YYEGLTORALDKGHRWYRVTLLYDGD-NLLASGSHLEAKSSDDSLTFNVFVPNVQ 144 

Query: 143 ASYTMDYATGEITLN 157 

A T DY TG+I +N 
Sbjct: 145 AGLTADYRTGQIAIN 159 

SEQ ID 9324 (GBS656) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 186 (lane 10; MW 57kDa). 

GBS656-GST was purified as shown in Figure 236, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 402 

A DNA sequence (GBSx0437) was identified in S.agalactiae <SEQ ID 1303> which encodes the amino 
acid sequence <SEQ ID 1304>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 932 1> which encodes amino acid sequence <SEQ ID 9322> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1305> which encodes the amino acid 
sequence <SEQ ID 1306>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 5350 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 24/73 (32%) , Positives = 37/73 (49%) , Gaps = 2/73 (2%) 

Query: 1 MFYMKLANRLSLAATIVNFANANSPFGIIIHSDKAENVF.WKTO 60 

+ YMKLA L TI+ E + SPF I+H+D A N++ E N +++P 

Sbjct: 80 ILYMKLAKENHLPVTIITETHMTSPFAFILHTDHAINLKETRLEVILKQTKNDQIjSKQTP 139 

Query: 61 K--KSLWQHFFSQ 71 

+ KS W+ F + 
Sbjct: 140 EKTKS FWKRFLKK 152 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 403 

A DNA sequence (GBSx0438) was identified in S.agalactiae <SEQ ID 1307> which encodes the amino 
acid sequence <SEQ ID 1308>. This protein is predicted to be Isopentenyl-diphosphate delta-isomerase. 
Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1649 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG20030 GB:AE005083 isopentenyl pyrophosphate isomerase; Idi 
[Halobacterium sp. NRC-1] 
Identities = 24/77 (31%) , Positives = 40/77 (51%) 

Query: 14 TGLTUm)QNIPQGLFHLVvDVILFHEDGDVLM1KRHPKKKAFPAYFEATAGGSALKGEN 73 

TGL D + G+ H +LF EDG VL+ +R +K+ + +++ T ++G++ 

Sbjct: 42 TGLANRLDAHTGDGWHRAFTCLLFDEDGRVLIAQRADRKRLWDTHVTOGTVASHPIEGQS 101 

Query: 74 AKQAILRELKEETGIVP 90 

,A + L EE GI P 
Sbjct: 102 QVDATRQRLAEELGIEP 118 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful 
vaccines or diagnostics. 

Example 404 

A DNA sequence (GBSx0439) was identified in S.agalactiae <SEQ ID 1309> which encodes the amino 
acid sequence <SEQ ID 1310>. This protein is predicted to be phosphoserine phosphatase (serB). Analysis 
of this protein sequence reveals the following: 
Possible site: 35 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0S13 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology- with the following sequences in the GENPEPT database: 

>GP:CAB50876 GB:AL096844 putative phosphoserine phosphatase 
[Streptomyces coelicolor A3 (2) ] 
Identities = 96/193 (49%) , Positives = 132/193 (57%) 



TPGA LI+ + +VG+VSGGF -f 



Query: 


5 


Sbjct: 


183 


Query: 


65 


Sb j ct : 


243 




125 


Sbjct: 


303 


Query: 


185 


Sbjct: 


363 



G LTG V GE1V + K L+ +A+ + LSQT+A+GDGANDL M+ +AG+G+AF A 



KP+VRE A +N 
KPVVREAAHTAVN 3 75 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 405 

A DNA sequence (GBSx0440) was identified in S.agalactiae <SEQ ID 131 1> which encodes the amino 
acid sequence <SEQ ID 1312>. Analysis of this protein sequence reveals the following: 

Possible site; 23 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-17.88 Transmembrane 5 - 21 ( 1-29) 



Final Results 

bacterial membrane Certainty=0. 8153 (Affirmative) < succ: 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



>GP:BAB06924 GB:AP001518 untaiown conserved protein [Bacillus halodurans] 
Identities = 122/553 (22%) , Positives = 265/553 (47%) , Gaps = 12/553 <2%) 
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7 


Sbjct: 


3 


Query. 


67 


Sbjct: 


63 


Query: 


• 127 


Sbjct: 


123 


Query: 


187 


Sbjct: 


183 


Query: 


247 


Sbjct: 


243 


Query: 


307 


Sbjct: 


303 


Query: 


364 


Sbjct: 


360 


Query: 


424 


Sbjct- 






484 


Sbjct: 


480 




544 


Sbjct: 


535 



LLLVAI VLLVT IAYWG WIRKRNDTL XR ^VOEEIEQVKLLHLIGQSQ 6 

+++ ++++L + +V G + RK + LE K +++ P+ +EI +VK L + G+++ 

IWFSLLVLTVTFFVYGALRRKAFYJCRVDKLEDWKM)ILQRPIPDEIGKVKGLm! 



t- ++VL + EE+N 



G+ I+ASEVL +A+E + + + +P + +L+ + P +L +L+ G R 4 



E++ ++ E +A++ •) 



KR ++K N+PG+P+ 



TlWAIANLEQATyLWQDATLTEQLLQySNRYKSFEQNVQKSFEQALYLFSVEHNYKftSF 543 

I + ++ A I> E ++QY NRYRS V+K A LF + 
AQGLIHENSSILHETIEKARnAEHVIQYGKRynSRSAEVKKRLSNAEELFSA FEY 534 



A related DNA sequence was identified in S.pyogenes <SEQ ID 131 3> which encodes the amino acid 

sequence <SEQ ID 1314>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
»> Seems to have an uncleavable N- terra signal seq 

INTEGRAL Likelihood =-18.04 Transmembrane 5 - 21 ( 1-29) 

Final Results 

bacterial membrane — Certainty=0 . 8217 (Affirmative) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB06924 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 131/555 (23%) , Positives = 269/555 (47%) , Gaps = 16/555 (2%) 

Query: 7 LLIVAIVLLVIIAYLVGVIIRKRNDSLITSLE3RKQALFALPVNDEIEEVKSLHLIGQSQ 66 

+++ ++++L + ++ G + RK + LE+ K + P+ DEI +VK L + G+++ 

Sbjct: 3 IWFSLLVLTVTFFvYGALRRKAFYKRVDKLEDWKNDILQRPIPDEIGKVKGLTMSGETE 62 

Query: 67 TSFREWNQKWVDLTVNSFADIENHIFEAEOTJOTFOTIRAKHEINSX^SQL^VEEDIAS 126 

F W W D+ ++E +F+ E+ + + F +AK ++++E +L+ +EE + 

Sbjct: 63 EKFEVmSDWDDIVGVILPNVEEQLFDVEDFANKYRFQKMCALLDTIEQRLHSIEEQLKI 122 



Query: 127 IREALNILKEQEEKNSARVTHALDLYEKLQASISENEDNFGSTMPEIDKQMKNIETEFSQ 186 
65 + + + +L + EE+N + +L +KL + S+ D++++ 
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Sbjct: 123 MVDDIQVLVQSREQNRTKIGSVRELQQKLIKEAITRRGSLSSSAKVFDEKLEKANELLQR 182 

Query: 187 FVALKSSGDPVEftSEVLDRAEEHTIflLGQITEQIPAIVAKLEDDFPDQLDDLETGYRRLL 246 

F G+ ++ASEVL+ A+E + + + +P + +L+ + P +L +L+ G R +- 

Sbjct: 183 FDERTEKGNYIQASEVLEEAKELLGQIEHIJjKIVPGLPVELQTNIPAEIjTNLKNGLRDME 242 

Query: 247 EENYHFPEKNIEARFQEIRESIRANSSELVTLDLDRAREENTHIQERIDSLYEVFEREIA 306 

E + + + E +L L+ + EE I+E ++ ++E+ E+E 

Sbjct: 243 EAGFFLETFAIDSQMERLEEKRVELLEQLTVLECMGMEEEIKFIEESMEQMFELLEKE-- 300 

Query: 307 AYKVAAKN- - SKMLPRYLEHVKRHNEQ IiKDEIARLSRKYILSETESLTVKAFEKDIK 361 

V AKN + +LP E + + E+ LK+E + Y L+E E + + K++K 
Sbjct: 301 ---VEAKNEITILLPNLREDLTKTEEKLTHLKEETESVQLSYRIAEEELVFQQKLGKELK 357 



Query: 422 NLDVYVTQLHMIKRYMEKRHLPGIPQDFLSAFFITSSQLEALMDELSRGR1NIEAVSRLS 481 

L +L KR ++K ++PG+P+ L +L + +LS + + V+ h 

Sbjct: 418 ELKQLKEKLLEDKRLVQKSNIPGLPETLLHRLEDGEQKLAQAIAKLSDVPLEMGRVTALV 477 

Query: 482 EVATVAIANLEDLTYQWQNATLTEQLLQYSNRYRSFEAGVQSSFEHALRLFEVENDYQA 541 

+ A I +++++ALE ++QY NRYRS A V+ +A LF 
Sbjct: 478 DEAQGLIHENSSILHETIEKARLAEHVIQYGNRYRSRSAEVKKRLSNAEELFRA F 532 

Query: 542 SFDE- ISYALETVEP 555 

+DE I A++ +EP 
Sbjct: 533 EYDEAIEMAVQAIEP 547 

An alignment of the GAS and GBS proteins is shown below: 
Identities = 429/574 (74%), Positives = 503/574 (86%) 

Query: 1 MSSGI ILLLVAIVLLVI IAYWGWIRKRNDTLIANLETRKQELVDLPVQEEIEQVKLLH 60 

MSSGIILL+VAIVLLVIIAY+VGV+IRKRND+LI +LE RKQ Ii LPV +EIE+VK LH 
Sbjct: 1 MSSGIILLIVAIVLLVI IAYLVGVI IRKRNDSLITSLEERKQALFALPVNDEIEEVKSLH 60 

Query: 61 LIGQSQSTFREWNQKWTDLSTNSFKDIDFHLVEAEM^SFNFVRAKHEIDNVDSQLTII 120 

LIGQSQ++FREWNQKW DL+ NSF DI+ H+ EAENtiND+FNF+RAKHEI++V+SQL ++ 
.Sbjct: 61 LIGQSQTSFREWNQKWVDLTWTSFADIEiraiFEAENLHDTFNFIRAKHEINSVESQIJOTjV 120 

Query: 121 EEDIVSIREALEVLKEQEEKNSARVTHALDLYETLQKSISEKEDNYGTTMPEIEKQLKNI 180 

EEDI SIREAL +LKEQEEKNSARVTHALDLYE LQ SISE EDN+G+TMPEI +KQ+KNI 
Sbjct: 121 EEDIASIREALNILKEQEEKNSARVTHALDLYEKLQASISENEDNFGSTMPEIDKQMKNI 180 

Query: 181 EAEFSHFVTLNSTGDPIEASEVLNKAEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLET 240 

E EFS FV LNS+GDP+EASEVL++AEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLET 
Sbjct: 181 ETEFSQFVALNSSGDPVEASEVLDRAEEHTIALGQITEQIPAIVAKLEDDFPDQLDDLET 240 

Query: 241 GYRRLLEENYHFPEKDIEQRFQEVREAIRSNSCGLVSLDLDRARDENEHIQEKIDKLYDI 300 

GYRRLLEENYHFPEK+IE RFQE+RE+IR+NS LV+LDLDRAR+EN HIQE+ID LY++ 
Sbjct: 241 GYRRLLEENYHFPEKHIEARFQEIRESIRANSSELVTLDLDRAREENTHIQERIDSLYEV 300 

Query: 301 FEREIAAYKVAHKDSKIIPQFIAHAKSmEQLGHEIKRLSAKYIIjNENESLSLRSFTNDL 360 

FEREIAAYKVA K+SK++P++L H K MNEQL EI RLS KYIL+E ESL++++F D+ 
Sbjct: 301 FEREIAAYKVAAraSKMLPRYLEHVKRNNEQLKDEIARLSRKYILSETESLTVKAFEKDI 360 

Query: 361 EEIETKVLPSVENFGQEASPYTHLQILFERTLKTLTTVEENQMEVFEAVKTIESVETRAR 420 

+EIE L E FG + P++ LQ+ FER++KTLT VE QM+VF AVK IE +E++AR 
Sbjct: 351 KEIEDSTIAVAEQFGLQEKPFSELQVTFERSIICTLTNVESGQMDVFAAVKDIEKIESQAR 420 

Query: 421 QNMDKYWKLHMIKRFMEKRNLPGIPQDFLSTFFTTSSQIEALINELSRGRIDIEAVSRL 480 

N+D YV +LHMIKR+MEKR+LPGIPQDFLS FFTTSSQ+EAL++ELSRGRI+IEAVSRL 
Sbjct: 421 HNLDVYVTQLHMIKRYMEKRHLPC-IPQDFLSAFFTTSSQLEALMDELSRGRINIEAVSRL 480 

Query: 481 OTVTIHAIANLEQATYLVVQDATIiTEQLLQYSKRYRSFEQNVQKSFEQALYLFEVEHNYK 540 
++V T AIANLE TY WQ+ATLT3QLLQYSNRYRSFE VQ SFE AL LFEVE++Y+ 
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Sbjct: 481 SEVATVAIANLEDLTYQWQHATLrEQLLQYSNEYRSPEAGVQSSPEHALRLFEVENDYQ 540 

Query: 541 AS FDE I S YALE TVE PGVTDRFVTS YE KTQERIRF 574 

ASFDEI SYALETVEPGVTDRFV SYEKT+E IRF 
Sbjct: 541 AS FDE I S YALETVE PGVTDRFVNS YE KTREHIRF 574 

SEQ ID 1312 (GBS642) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 142 (lane 2-4; MW 27kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 406 

A DNA sequence (GBSx0441) was identified in S.agalactiae <SEQ ID 1315> which encodes the amino 
acid sequence <SEQ ID 1316>. Analysis of this protein sequence reveals the following: 

J-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2471 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9671> which encodes amino acid sequence <SEQ ID 9672> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CA&91553 GB:Z67740 DNA gyrase [Streptococcus pneumoniae] 
Identities = 574/650 (88%) , Positives = 618/650 (94%) , Gaps = 2/650 (0%) 

MTEETKNMEQRAQEYDASQIQVLEGLEAVRMRPGWYIGSTSKEGLHHLVWEIVDNSIDEA 6 0 
MTEE KN++ AQ+YDASQIQVLEGLEAVRMRPGMYIGSTSKEGLHHLVWEIVDNSIDEA 



LAGFA HI+V+ IEPD+S ITWDDGRGI PVD1QEKTGRPAVETVFTVLHAGGKFGGGGYKV 



SGGLHGVGSSWNALSTQLDV V+KNGK-t-HYQEY+RG W DLE++GDTD +GTTVHFTP 



DPEIFTETT+FDFDKL KRIQELAFLNRGL+ISI+DKR+G E X YHYEGGI SYVE+I 



Query: 


1 


Sbjct: 


1 


Query: 
Sbjct: 


61 
59 


Query: 


121 


Sbjct: 


119 




181 


Sbjct: 


179 




241 


Sbjct: 


239 




301 


Sbjct: 


299 


Query: 


361 


Sbjct: 


359 


Query: 




Sbjct: 


419 



I + VE VAMQYTTGY E VMSFANNIHTHEGGTHEQGFRTAL 



TRVINDYA+KtJK+LK+NEDNLTGEDVREGLTAVISVKHPNPQFEGQTKrKLGNSEVVKIT 

358 



NRLFSEAF+ FL4FJJPQ+AK+1WKGI1A+KAR+AAKRAREVTRKKBGLEISNLPGKLAD 
NRLFSEAFSDFLMENPQIAKRIVEKGILAAKARVAAKRAREVTRKKSGLE I SNLPGKLAD 418 

CSSNNAEMNELFIVEGDSAGGSAKSGRNREFQAILPIRGKItNVEKAT^KILANEEIRS 480 
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Query: 481 LFTflMGTGFGADFDVSKVRYQKLVIMTDADVDGAHIRTLLLTLIYRPMRPVLEAGYVYIA 540 

LFTAMGTGFGA+FDVSK RYQKLV+MTDADVDGAHIRTLLLTLIYR+M+P+LEAGYVYIA 
Sbjct: 479 LFTAMGTGFGAEFDVSIC&RYQIOiVLMTDADVDGAHIRTLLLTLIYRYMKPILEAGYVYIA 538 

Query: 541 QPPIYGVKVGSEIKAYIQPGVNQEEELRQALDTYSSGRSKPTVQRYKGLGEMDDHQLWET 600 

QPPIYGVKVGSEIK YIQPG +QE +L++AL YS GR+KPT+QRYEGLGEMDDHQIiWET 
Sbjct: 539 QPPIYGVKVGSEIKEYIQPGADQEIKLQEALARYSEGRTKPTIQRYKGLGEMDDHQLWET 598 

Query: 601 TMDPENRLMRRVSVDDAAEADKIFDMLMGDRVEPRREFIEANAVYSNLDI 650 

TMDPE+RLMARVSVDDAAEADKIFDMLKGDRVEPRRSFIE NAVYS LD+ 
Sbjct: 599 TMDPEHRLMARVSVDDAAEADKIFDMLMGDRVEPRREFIEENAVYSTLDV 648 

A related DNA sequence was identified in S.pyogenes <SEQ ID 131 7> which encodes the amino acid 
sequence <SEQ ID 1318>. Analysis of this protein sequence reveals the following: 

3 N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1698 (Affirmative) c suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 584/650 (89%) , Positives = 618/650 (94%) 



LAG FA HIKV+IE DNSITWDDGRGIPVDIQ KTGRPAVETVFTVLHAGGKFGGGGYKV 



SGGLHGVGSSVVNALSTQLDV+VYKNG++HYQE++RG W DLE+IG TD++GTTVHFTP 



DPEIFTETT FD+ LAKRIQELAFLNRGL+ISI+DKR G E E+ + YEGGIGSYVEF+ 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 




301 


Sbjct: 


301 




361 


Sbjct: 


361 




421 


Sbjct: 


421 






Sbjct: 


481 




541 


Sbjct: 


541 



2GFRTAL 300 

rDGEL+GI+VEVAMQYTT YQETVMSFANNIHTHEGGTHEQGFR AL 

2GFRAAL 300 



TRVINDYAKKNKILKENEDNLTGEDVREGLTAVISVKHPKPQFEGQTKTKLGNSEVVKIT 



NRLFSFAF RFLLENPQVA+KIVEKGIIASKARIAAKRAREVTRKKSGLEISHLPGKLAD 



CSSN+A I^LFIVEGDSAGGSAKSGRNREFQAILPIRGKILNVFIKATMDKIIjANEEIRS 



QPPIYGVKVGSEIK YIQPG++QE++L+ AL+ YS GRSKPTVQRYKGLGEMDDHQLWET 
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Query: 601 TMDPENRLMARVSVDDAAEflDKIFDMLMGDRVEPRREFIKANAVYSNLDI 650 

TMDPENRLMARV+VDDAAEADK+FDMLMGDRVEPRR+FIE NAVYS LDI 
Sbjct: 601 TMDPEmLMARVTVDDAREADKVFDMLMGDRVEPRRDFIEENAVySTLDI 650 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 407 

A DNA sequence (GBSx0442) was identified in S.agalactiae <SEQ ID 1319> which encodes the amino 
1 0 acid sequence <SEQ ID 1320>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 3186 (Affirmative] < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAA91552 GB:Z67740 unidentified [Streptococcus pneumoniae] 

Identities = 82/142 (57%) , Positives = 105/142 (73%) 



Query: 45 LKESTADAIAYFIPEEADFLKEYKANEAK\n^ETPILFQGAKELLAKIQRQGSRNFLVSHR 104 
LK ST AI F P +FL++YK NEA+ LE PILF+G +LL I QG R+FLVSHR 
25 Sbjct: 2 LKVSTPFAIETFAPNLENFLEKYKENEARELEHPILFEGVSDLLEDILNQGGRHFLVSHR 61 



Query: 105 DNQVIVILEKTEI1DYFTEWTADNQFSRKPSPESMLYLKEKYQIDNCLVIGDRDIDKQA 164 

++QV+ ILEKT I YFTEWT+ +GF RKP+ PESMLYL+EKYQI + LVIGDR ID +A 
Sbjct: 52 NDQVLEILEKTSIAAYFTEWTSSSGFKRKPNPESMLYLREKYQISSGLVIGDRPIDIEA 121 

30 

Query: 165 GESAGFDTLL VDGSKSLMEI IE 186 

G++AG DT L +L ++++ 

Sbjct: 122 GQAflGLDTHLFrSIVNLRQVLD 143 



35 A related DNA sequence was identified in S.pyogenes <SEQ ID 1321> which encodes the amino acid 
sequence <SEQ ID 1322>. Analysis of this protein sequence reveals the following: 



Seems to have no N-terrainal signal sequence 
■ Final Results - 



.al cytoplasm Certainty=0 .2472 (Affirmative) ■ 



bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) <: suco 



45 An alignment of the GAS and GBS proteins is shown below: 

Identities = 122/185 (65%) , Positives = 145/185 (77%) 



Query: 1 MRYHDYIWDLGGTLLDNYESSTRAFVETLKEFGYQADHDSVYQKLKESTADAIAYFIPEE 60 

MNY DYIWDLGGTLLDNYE ST+AFV+TL F DHD4-VYQKBKESTA A+A F P E 

Sbjct: 4 MNYQDYIWDLGGTLLDNYELSTQAFVQTIAFFSLPGDHDAWQKLICESTAIAVAMFAPNE 63 

Query: 61 ADFLKEYKANEAKVLETPILFQGAICELLAKIQRQGSRNFLVSHRDNQVIVILEKTEIIDY 120 

+FL Y+ EA L PI GAKE+L KI GSRNFL+SHRD QV +LE+ ++ Y 

Sbjct: 64 PEFLHVYRRJEADKlAQPIWCMAKEILGKIATSGSRNFLISHRDCQVNQIiLEQAGLLrY 123 

Query: 121 FTEWTADNGFSRKPSPESMLYLKEKYQIDNCLVIGDRDIDKQAGESAGFDTLLVDGSKS 180 

FTEWTA NGF+RKP+PES+ YLKEKY I++ LVIGDR IDKQAG++AGF+TLLVDG K+ 

Sbjct: 124 FTEWTASNGFARKPNPESLFYLKEKYDINSGLVIGDRLIDKQAGQAAGFNTLLVDGRKN 183 
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Query: 181 LMEII 185 
L+EI+ 

Sbjct: 184 LLEIV 188 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 408 

A DNA sequence (GBSx0443) was identified in S.agalactiae <SEQ ID 1323> which encodes the amino 
acid sequence <SEQ ID 1324>. This protein is predicted to be stage V spomlation protein E (rodA). 
10 Analysis of this protein sequence reveals the following: 

Possible site: 42 

Seems to have a cleavable W-term signal seq. 

INTEGRAL Likelihood =-11.15 Transmembrane 206 - 222 ( 177 - 22S) 

5.14 Transmembrane 58 - 74 ( 50 - 82) 

3.34 Transmembrane 182 - 198 ( 177 - 205) 

J. 55 Transmembrane 158 - 174 ( 156 - 177) 

3.12 Transmembrane 300 - 316 { 299 - 324) 

!.66 Transmembrane 86 - 102 ( 83 - 102) 

2.34 Transmembrane 338 - 354 { 338 - 357) 



Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



Final Results 

bacterial membrane Certainty=0 . 5458 (Affirmative) < suoo 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9669> which encodes amino acid sequence <SEQ ID 9670> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP.-CAB15838 GB.-Z99123 alternate gene name: ipa-42d~similar to 
cell -division protein [Bacillus subtilis] 
Identities = 142/392 (36%), Positives = 237/392 (60%), Gaps = 23/392 (5%) 



EFMK+ I+ML+ + + K +T +DD LL + G+ +PV ++LM +D GTA 4 



I- +SGI+W +1 I + +L 1+ L++ I N + ++G+ YQI R+++ 



Query: 


10 


Sbjct: 


7 


Query: 


69 


Sb j ct : 


60 




127 


Sbjct: 


117 




186 


Sb j ct : 


175 




246 


Sbjct: 


233 


Query: 


303 


Sbjct: 






361 


Sb j ct : 


353 



GIPL F+S GGSS LS LIG G+V tSQT 
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There is also homology to SEQ ID 1028. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 409 

A DNA sequence (GBSx0444) was identified in S.agalactiae <SEQ ID 1325> which encodes the amino 
acid sequence <SEQ ID 1326>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3195 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1327> which encodes the amino acid 
sequence <SEQ ID 1328>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 



Pinal Results 

bacterial cytoplasm — Certainty=0 .2735 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 38/55 (69%) , Positives = 48/55 (87%) 

Query: 8 DEFKE^IDKGYISGNWAIVRKNGKIFDYVLLHEEVREEEVVTVERVLDVLRKLS 52 

DEFK4AID GYI +G+TVAIVRK+G+I FDYVL HE+V+ EWT E+V +VL +LS 
Sbjct,: 5 DEFKQAIDNGYIAGDTOAIVRKDGQIFDYVLPHEKAT^GEVVTKEKVEEVLVELS 59 

Based on this analysis, it was predicted that these proteins and then epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 410 

A DNA sequence (GBSx0445) was identified in S.agalactiae <SEQ ID 1329> which encodes the amino 
acid sequence <SEQ ID 1330>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4241 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 133 1> which encodes the amino acid 
sequence <SEQ ID 1332>. Analysis of this protein sequence reveals the following: 
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3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .4551 (Affirmative) < succ 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 57/66 (86%), Positives = 63/66 (95%) 



Query: 1 MSQEKLKSKLDQAKGGAKEGFGKITGDKELEAKGFIEKTIAKGKELADDAKDAVEGRVDA 60 

MS+EKLKSK++QA GG KEG GK+TGDKELFAKGF+EKTIAKGKELADDAK+AVEGAVDA 
Sbjct: 1 MSEEKLKSKIEQASGGLKEGAGKLTGDKELEAKGF\rEKTIAKGKELADDAKEAVEGAVDA 60 

Query: 61 VKEKLK 66 

VKEKLK 
Sbjct: 61 VKEKLK 66 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 411 

A DNA sequence (GBSx0447) was identified in S.agalactiae <SEQ ID 1333> which encodes the amino 
acid sequence <SEQ ID 1334>. This protein is predicted to be TnpA (orfB). Analysis of this protein 
sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3961 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9667> which encodes amino acid sequence <SEQ ID 9668> 
was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1335> which encodes the amino acid 
sequence <SEQ ID 1336>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3365 (Affirmative) < suco 

bacterial membrane --- Certainty=0.0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 152/160 (95%) , Positives = 154/160 (96%) 

Query: 1 MKNMALPKmTVKTKTALKKTQK^ 60 

MKNMALPKMATVK KTALK+TQKTYPQNLLNQKFNPDKPNQVWSTDFTYISIGYKKYVYL 
Sbjct: 194 MKNMALPKMAIVKPKTALKRTQKIYPQNLINQKF^PDKPNQWSTDFTYISIGYKKVVYL 253 

Query: 61 CAIIDLYSRKYIAWKLSHROTAKIACDTliEIiAIiNKRKIEGTLLFHSDCGSQFKAREFRKI 120 

CAI+DLYSRK IAWKLSHRMDAKLACDTLELALNKRKlEGTLLFHSDQGSQFKARE RKI 
Sbjct: 254 CAILDLYSRKCIAWKLSHRMDAKrACDTLEL&LNKRKIEGTLLFHSDQGSQFKARELRKI 313 
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Query: 121 IDDNNIMHSFSKPRYPYDNAVTEAFFKYLKKRQ1NQKNYQ 160 

IDDN IMHSFSKP YPYDNAVTEAFFKYLKHRQINQK YQ 
Sbjct: 314 IDDNTIMHSFSKPGYPYDNAVTEAFFKYLKHRQINQKKYQ 353 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 412 

A DNA sequence (GBSx0448) was identified in S.agalactiae <SEQ ID 1337> which encodes the amino 
acid sequence <SEQ ID 1338>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1090 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



e 413 

A DNA sequence (GBSx0449) was identified in S.agalactiae <SEQ ID 1339> which encodes the amino 
acid sequence <SEQ ID 1340>. This protein is predicted to be histidine kinase (resE). Analysis of this 
protein sequence reveals the following: 

Possible site: 40 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.57 Transmembrane 17- 33 ( 6 - 38) 
Likelihood = -4.67 



Final Results 

bacterial membrane Certainty=0.5628 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD25109 GB:AF140356 VncS [Streptococcus pneumoniae] 
Identities = 178/435 (40%), Positives = 281/435 (63%), Gaps = 1/435 (0%) 

40 Query: 1 MKKLKIFPKMFIQIFSILGILIILVHSLFFFIFPKTYLETRKVKIHIMADEISKNiyiNGKE 60 

MK+ +F K+FI FSI +L+I +H +F+FP TYL R+ I A I++++ GK+ 
Sbjct: 1 MKRTGLFAKIFIYTFSIFSVLVICLHIAIYFLFPSTYL3HRQETIGQKATAIAQSLEGKD 60 

Query: 61 LKYLDQTLELYSKSSDIKVFIK30MKNELQINDNINVNVKSDSNSLIIEEREIKLHDGK 120 
45 + ++Q L+LYS++SDIK +K +++L++ D++ ++ + SL IEERE+K DG 

Sbjct: 61 RQSIEQVLDLYSQTSDIKGTVKGEMTEDKLEVKDSLPLDTDRQTTSLFIEEREVKTQDGG 120 

Query: 121 KIHLQFVSTADMQKDAKDLSLKFLPYSLSISFLFSIVISLIYAKSIKNNIQEITMVTDKM 180 
+ LQF+++ D+QK+A+ +SL+FLPY+L SFL S++++ IYA++I I EI VT +M 
50 Sbjct: 121 TMILQFLASMDLQKEAEQI SLQFLPYTLLASFLISLL VAYI YARTI VAPILE I KRVTRRM 180 

Query: 181 IKLDKETRLKISSNDEIGQLKQQINDLYCALIjNTINDLEFKNKEILKLEKLKYDFFKGAS 240 

+ LD + RL++ S DEIG LK+QIN LY LL I DL KN+ IL+LEK+K +F +GAS 
Sbjct: 181 MDLDSQVRLRVDSKDEIGNLKEQINSLYQKLLWIADLHEKNEAILQLEKMKVEFLRGAS 240 
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Query: 241 HELKTPLSSLKILLENMK™iGiaKDRDFYISECINIVDl^TKWSQII.SEYSIKDLNND 300 

HELKTPL4SLKIL+ENM4 NIG+YKDRD Y4 + IVD L +V QILS S+++L +D 
Sbjct: 241 HELKTPLRSLKILIENMRENIGRYKDRDQYLGVALGIVDELNHHVLQILSLSSVQELRDD 300 

Query: 301 EEYLOTGDTLDEVLEKYSILWQKKININKELLDYNIYIGKTALNIVFSNLISNAVKYTN 360 

E 4+4 +++ Y++L 4444 1+ L Y+ + + ++ SNLISNR+K++ 

Sbjct: 301 RETIDLLQMTQNLVKDYALLAKERELQIDNSLTHQQAYMPSWKLILSNL1SNAIKHSV 3S0 

Query: 361 RNGIINIKIANDWLLIENSYDKNKISKINKILDASFDLKLDNSNGLGLNIVKNILNKYNI 420 

G++ I L IENS 4 K+ + + K+ S G+GL 4VK44L + 

Sbjct: 361 PGGLVRIGEREGEr,FIENSCSSEEQEKLAQSFSDKASRIWKGS-GMGLFWKSLLEHEKL 419 

Query: 421 KYEILHGENYFIFKI 435 

Y EN F I 

Sbjct: 420 AYRFEMEENSLTFFI 434 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1341> which encodes the amino acid 
sequence <SEQ ID 1342>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have an uncleavable N-tertn signal seq 

INTEGRAL Likelihood =-11.83 Transmembrane 14 - 30 ( 6 - 35) 
INTEGRAL Likelihood = -2.44 Transmembrane 157 - 173 ( 156 - 174) 

Final Results 

bacterial membrane Certainty=0 . 5734 (Affirmative) suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD25109 GB:AF140356 VncS [Streptococcus pneumoniae]. 
Identities = 123/455 (27%), Positives = 223/455 (48%), Gaps = 23/455 (5%) 

LIKKTFLVINGLIIWVTSILLVLYFAMPIYYTKVKDKEVKCEFDQTSKQIKGKTVTEIR 62 
L K F+ + V+V + L +YF P Y + + + + ++ ++GK I 
LFAKI FI YTFS IFSVLVICLHLAI YFLFPSTYLSHRQETIGQKATAIAQSLEGKDRQS IE 6 5 





3 


Sb j ct : 


6 




63 


Sbjct: 


66 




123 


Sbjct: 


112 




183 


Sbjct: 


172 




243 


Sbjct: 


232 


Query: 


303 


Sbjct: 


292 


Query: 


363 


Sbjct: 


351 


Query: 


423 


Sb j ct : 


404 



++V +DG , M L 



j T I h 44 E 



K EFLR SHELKTP+ S+ +1+ M N+G + DRD+YL 



>■ +L++ + L++ +L Q 



++ GG V I 4+ +L 1+N + 44 44L Q F 

Sbjct: 351 LISNAIKHSVPGGLVRIGEREGELFIENSC SSEEQEKLAQSF- - 



G4GLF4 4L4 LAYRF 444 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 108/454 (23%) , Positives = 220/454 (47%) , Gaps = 22/454 (4%) 

5 Query: 4 LKIFPKMFIQIFSILGILIILVHSLFFFIFPKTYLETRKVKIHIMADEISKNMNGKELKY 63 

+++ K F+ I ++ +++ + + +F P Y + + ++ D+ SK + GK + 
Sbjct: 1 TOLIKKTFLVIWGLIIVVVTSILLVLyFAMPIYYTKVKDKEVKCEFDQTSKQIKGKTVTE 60 

Query: 64 LDQTLELYSKSSDIKVFIKKNNNK NELQINDNINVNVKSDSN--SLII 109 

10 + L +1 + ++N+ +E + + N+N+ D++ ++ + 

Sbjct: 61 IPJDILTKKINKDNIWYSLVDSDNQLLYPSLQIXDGVSESKDSQNVIJIVTTFDNSYSNV'KV' 120 

Query: 110 EEREIKLHDGKKIHLQFVSTADMQKDAKDLSLKFLPYSLSISFLFSIVISLIYAKSIKNN 169 
+++ L DGKK+ L S+ DA + L P L S +++ +Y+++ 

15 Sbjct: 121 MSQKVTLRDGKKMTLLGQSSLQPVTDASKVLLDLYPSLLIFSVTVGSIVAYLYSRTSSRR 180 

Query: 170 IQEITMVTDKMIKLDKETRLKISSNDEIGQLKQQINDLYCALLNTINDLEFKNKEILKLE 229 

I ++ KM+ L+ I DEI L IN LY +L +1 L+ + ++ E 

Sbjct: 181 ILSMSQTAKKMVNLEPNLTCTIHGKDEIAMIASDINRLYASLSTSIKSLQKEYEKASDSE 240 

20 

Query: 230 KLKYDFFKGASHELKTPLSSLKILLENMKYNIGKYKDRDFYISECINIVDNLTKNVSQIL 289 

+ K +F + SHELKTP++S+ +++ M YN+G + DRD Y+ +C ++++ + V IL 
Sbjct: 241 REKSEFLRMTSHELKTPITSVIGMIDGMLYNVGDFADRDKYLRKCRDVLEGQAQLVQSIL 300 

25 Query: 290 SFYSIKDL-NNDEEYLNVGDTLDEVLEKYSILVNQKKININKELLDYNIYIGKTALNIW 348 

S 1+ L + ++E ++ +L+E +E + +L K + + L++ K L 
Sbjct: 301 SLSKIETLASQNQELFSLKSSLEEEMEVFLVLSELKHLKVTINLEEQFVKANJCVYLLKAI 360 

30 N+I NA YT G + I++ ++ L+I+N + 4 K L F + D 

Sbjct: 361 KNIIDNAFHYTKSGGQVMIQLKDNQLVIKNEAETLLTQQQMKQLFQPFYRPDYSRNRKDG 420 

Query: 403 SNGLGLNI VKNILNKYHIKYE- ILHGENYFIFKI 435 
GLGL I IL+++++ Y ++ + + +F I 
35 Sbjct: 421 GTGLGLFITHQILDQHHLAYRFWLDQRKMVFTI 454 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 414 

40 A DNA sequence (GBSx0450) was identified in S.agalactiae <SEQ ID 1343> which encodes the amino 
acid sequence <SEQ ID 1344>. This protein is predicted to be response regulator (regX3). Analysis of this 
protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -0.80 Transmembrane 50 - 66 ( 50 - 66) 

Final Results 

bacterial membrane --- Certainty=0 . 1319 (Affirmative) < suco 

bacterial outside -— Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9665> which encodes amino acid sequence <SEQ ID 9666> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

55 >GP:AAD25108 GB:AF140356 VncR [Streptococcus pneumoniae] 

Identities = 131/218 (60%) , Positives = 176/218 (80%) , Gaps = 1/218 (0%) 

Query: 5 MKILTVEDDKLIREGISEYLSEFGYTVIQAKDGREALSKFNS-DINLVILDIQIPFINGL 63 
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Sbjct: 

Query: 64 EVLKEIRKKSNLPILILTAFSDEEYKIDAFINLVDGYVEKPFSLPVLKaRIDSLIKKNFG 123 

EVL EIRK S +P+L+LTAF DEEYK+ AF +I> DGY+EKPFSL +LK R+D++ K+ + 
Sbjct: 61 EVLAEIRKTSQVPVLMLTAFODEEYKMSAFASLADGYLEKPFSLSLLKVRVDAIFKRYYD 120 

Query: 124 HLEKFEYKHLSVNFNSYTAKINDEKIDVNAICELEILKCLIjDNDGQVLTRMQIIDYVWKDS 183 

F YK+ V+F SY+A + +++ +NAKELEIL L+ N+G+ LTR QIID VWK + 
Sbjct; 121 TC3RIFSYI03TKVDFESYSASLAGQEVPINAKELEILDYLVKNEC3RALTRSQIIDAVWKAT 180 

Query: 184 EEIPYDRWDVYIKELRKKLQLDCITTIRNVGYKLERK 221 

4E+P+DRV+DVYIKELRKKL LDCI T+RNVGYKLERK 
Sbjct: 181 DEVPFDRVIDVYIKELRKKLDLDCILTVRNVGYKLERK 218 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1345> which encodes the amino acid 
sequence <SEQ ID 1346>. Analysis of this protein sequence reveals the following: 

Possible site: SO 
■»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 48 - 64 ( 48 - 64) 

Final Results 

bacterial membrane Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the databases: 

>GP:AAF72358 GB.-AF192329 VanRB [Enterococcus faecalis] 
Identities = 88/215 (40%), Positives = 128/215 (58%), Gaps = 2/215 (0%) 

KILWEDDDTISQVICEFLKANNYDPDCVFDGQAALDKWQTTSYDLIILDIMLPSLSGLE 62 
+IL+VEDDD I + FL YD DG A K+ +Y L+ILDIMLP ++G E 
RILLVEDDDHIQ^TWGFLAEAGYQVDACTDGNEAYTKFYENTYQLVIIiDIMLPGMNGHE 63 

VLKTIRKTSDVPIIMLTALDDEYTQLVSFNHLISDYVTKPFSPLILIKRIENVLRVSTPD 122 
+L+ R +D PI+M+TAL D+ Q+ +F+ DYVTKPF IL+KR+E +LR S 





3 


Sbjct: 






63 


Sbjct: 


64 


Query: 


123 


Sbjct: 


124 




181 


Sb j Ct : 


184 



- +V GT + LT+KE++I+ L + + +T + ++ IWGY 



HIKNLR K+ +KTI G+GY L E 
HTHIKNLRAKLPENIIKTIRGVGYRLEE 218 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 80/214 (37%), Positives = 126/214 (58%), Gaps = 4/214 (1%) 

KILTVEDDKLIREGISEYLSEFGYTVIQAKEGREALSKFNS-DINLVILDIQIPFINGLE 64 
KIL VEDD I + I E+L Y DG+ AL K+ + +L+ILDI +P ++GLE 

KILWEDDDTISQVICEFLKANNYDPDCVFDGQAALDKWQTTSYDLIILDIMLPSLSGLE 62 

VLKEIRKKSNLPILILTAFSDEEYKIDAFTNLVDGYVEKPFSLPVLKARIDSLIKKNFGH 124 
VLK IRK S++PI++LTA DE ++ +F +L+ YV KPFS +L RI+++++ + 



+++TR Q++D +W SE 



Query: 




Sbjct: 


3 


Query: 


65 


Sbjct: 


63 




125 


Sbjct: 


123 


Query: 


185 


Sbjct: 


182 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 415 

A DNA sequence (GBSx0451) was identified in S.agalactiae <SEQ ID 1347> which encodes the amino 
acid sequence <SEQ ID 1348>. This protein is predicted to be Vexp3. Analysis of this protein sequence 
reveals the following: 

Possible site: 49 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.68 Transmembrane 423 - 439 ( 413 - 447) 

INTEGRAL Likelihood =-10.67 Transmembrane 16 - 32 ( 12 - 37) 

INTEGRAL Likelihood = -9.77 Transmembrane 303 - 319 ( 301 - 326) 

INTEGRAL Likelihood = -3.13 Transmembrane 343 - 359 ( 343 - 367) 



1j Final Results 

bacterial membrane Certainty=0 . 6074 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD47594 GB:AF140784 Vexp3 [Streptococcus pneumoniae] 
Identities = 280/458 (61%) , Positives = 363/458 (79%) , Gaps = 3/458 (0%) 

Query: 1 MIKNAFAYVTRKSLKSLIIILVILSMATLSIISLSIKDATDRASKETFANITNSPSMEIN 60 



Query: 61 RQVNPGTPRGGGNVjCi-ttni 1 KI , I ,r- n n/KJ MS V ADLVDHDI IETQDTLANQSPER 120 

R+VN GTPRG GN+KGEDIKKI++ +I+SYVKRIN++ DL +D+IET +T N + +R 
Sbjct: 61 RRWQGTPRGAGNIKGEDIKXITENKAIESWKRINaiGDLTGYDLIETPETKiCMLTADR 120 

Query: 121 AKNFKKTVMLTGVTroSAKETKFVSEAYKLVEGKllL^ 180 

AK F ++M+TGVNDS+KE KFVS +YKLVEG+EL N DK+KIL+HKDLA K+ KVGDK 
Sbjct: 121 AKRFGSSLMITGWDSSKEDKFVSGSYiO^VEGEHLTNDDKDKILLHKDLAAKHGWKVGDK 180 

Query: 181 IKIKSNLFDADNEKVAE^TVEVEIKGLFDGHNSGGVSAAQELYEWTLITDVHSAAKVYGN 240 

+K+ SN++DADNEK A ETVEV IKGLFDGHN V+ +QELYENT ITD+H+AAK+YG 
Sbjct: 181 VKLDSNIYDADNEKGAKETVEVTIKGLFDGKNKSAVTYSQELYENTAITDIHTAAKLYGY 240 

Query: 241 TEDTAVYQDATFFVKGDKNLDSVIKDL-GKLDINVJREYNLIKSSSNYPALQQSISGIYSI 299 

TEDTA+Y DATFFV DKNLD V+K+L G INW+ Y L+KSSSNYPAL+QSISG+Y + 
Sbjct: 241 TEDTAIYGDATFFVTADKNLDDWKELNGISGI^KSYTLVKSSSNYPALEQSISG^KM 300 

Query: 300 SNKLFVGSLIFAGVWSLLLFLWMNARKKEIAVLLSLGISKLEIFGQFIIEMVFISIPAL 359 

+N LF GSL F+ ++++LLL LW+NAR+KE+ +LLS+G+ + I GQFI E + I+IPAL 
Sbjct: 301 ANLLFWGSLSFSVLLLALLLSLWINARRKEVGILLSIGLKQASILGQFITESILIAIPAL 360 

Query: 360 LGS YFLAQYTADKLGNNILNKVTGD IAKQ IARQSAS SQLGGGAEAEGFNKTLSGLD INV- 418 

+ +YFLA YTA +GN +L VT +AKQ ++ + +S LGGGAE +GF+KTLS LDI++ 
Sbjct: 361 VSAYFLANYTARAIGNTVIANvTSGVAKQASKAAQASNLGGGAEVDGFSKTLSSLDISIQ 420 

Query: 419 LPKFIIYWIFMSFVLLVSLILSSIYTLRKNPKELLID 456 

FII V+ + V+LV + L+S LRK PKELL+D 
Sbjct: 421 TSDFIIIFVLALVLWLV-MALASSNLLRKQPKELLLD 457 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1349> which encodes the amino acid 
sequence <SEQ ID 1350>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



INTEGRAL Likelihood =-12.90 Transmembrane 19 - 35 ( 16 - 43) 

INTEGRAL Likelihood = -7.27 Transmembrane 371 - 387 ( 359 - 392) 

INTEGRAL Likelihood = -7.01 Transmembrane 335 - 351 ( 326 -,357) 

INTEGRAL Likelihood = -6.21 Transmembrane 282 - 298 ( 276 - 308) 

Final Results 

bacterial membrane Certainty=0. 6158 (Affirmative) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

] 

(61%), Gaps = 16/408 (3%) 

Query: 1 MENWKFALSSIWGHKMRSILTMLGIIIGVAAWIIMGLGNAMKNSVTSTFSSKQKDIQLY 60 

+EN + ALSS+ HKMRSILTMLGI IIGV +V++++ +G + + + S ++LY 
Sbjct: 4 LENIRMALSSVLAHKMRS ILTMLGI I IGVGSVIVWAVGQGGEQMLKQS I SGPGNTVELY 63 

Query: 61 FQEKGEE--EDLYAGLHTHENNHEVKPEWLEQIVKDIDGIDSYYFTNSATSTISYEKKKV 118 

+ EE + A + +++K +K I+GI + S + Y +++ 

Sbjct: 64 YMPSDEELASNPNAAAESTFTENDIKG LKGIEGIKQWASTSESMKARYHEEET 117 

Query: 119 DNAS I IGVSKDYFNI KNYD IVAGRTLTDND YSNFSRI ILLDTVLADDLFGKGNYKSALNK 178 

D A++ G++ Y N+ + I +GRT TDND+ +R+ ++ +A +LF K S L + 
Sbjct: 118 D-ATVNGINDGYMNYMSLKIESGRTFTDNDFLAGNRVGIISQKMAKELFDK TSPLGE 173 

Query: 179 WSLSDKDYLVIGVYKTDQTPVSFDGLSGGAVMANTQVASEFGTKEIGSIYIHVNDIQNS 238 

W ++ + +IGV K +SFD LS V N + S FGT + ++ 4 V + . 

Sbjct: 174 VWINGQPVEIIGVLKiCVTGLLSFD-LSE^fYVPFN-MMKSSFGTSDFSNVSLQVESADDI 231 

Query: 239 MNLGNQAADMLTNISHIKDGQYAVPDNSKIVEEINSQFSIMTTVIGSIAAISLLVGGIGV 298 

+ G 4AA LN+H + YV+ +1 I +IMTT+IGSIA ISLLVGGIGV 
Sbjct: 232 KSAGKEAAQ-LVNDNHGTEDSYQVMNMEEIAAGIGKVTAIMTTIIGSIAGISLLVGGIGV 290 

Query: 299 MNIMLVSVTERTREIGLRKALGATRLKILSQFLIESWLTVLGGLIGLLLAQLSVGALGN 358 

MNIMLVSVTERTREIG+RK+LGATR +IL+QFLIESWLT++GGL+G+ + AL + 

Sbjct: 291 MNIMLVSVTERTREIGIRKSLGATRGQILTQFLIESWLTLIGGLVGIGIG-YGGAALVS 349 

Query: 359 AMTLKGACISLDVALIAVLFSASIGVFFGMLPANKASKLDPIEALRYE 406 

A+ + IS V VLFS IGV FGMLPANKA+KLDPIEALRYE 
Sbjct: 350 AIAGWPSLISWQWCGGVLFSMLIGVIFGMLPANKAAKLDPIEALRYE 397 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 56/247 (22%) , Positives = 101/247 (40%) , Gaps = 42/247 (17%) 
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Sbjct: 
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369 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 416 

A DNA sequence (GBSx0452) was identified in S.agalactiae <SEQ ID 1351> which encodes the amino 
acid sequence <SEQ ID 1352>. This protein is predicted to be Vexp2 (b0879). Analysis of this protein 
sequence reveals the following: 

;erminal signal sequence 



10 Final Results 

bacterial cytoplasm Certainty=0. 3194 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MDILEIKNVNYSYANSKEKVLSGVNQKFELGKFY'AIVGKSGTGKSTLLSLIAGLDKVQTG 60 

M +L++++V Y Y N+- E VL +N FE GKFY+I+G+SG GKSTLLSLLAGLD G 

Sbjct: 1 MTLLQLQDVTYRYKNTAEAVLYQINYNFEPGKFYSIIGESGAGKSTLLSLLAGLDSPVEG 60 

Query: 61 KILFKNEDIEKKGYSNHRKNNISLVFQNYNLIDYLSPIENIRLVNKSVDESILFELGLDK 120 

ILF+ EDI KKGYS HR ++ISLWQNYNLIDYLSP+ENIRLVNK ++ L ELGLD+ 

Sbjct: 61 SILFC^EDIRKKGYSYHRMHHISLVFQNYNLIDYLSPLENIRLVNKKASKNTLLELGLDE 120 



Query: 181 KCVI WTHSKEVADSADI ILELSGKKL 207 

KCVIWTHSKEVA +4-DI LEL KKL 
Sbjct: 181 KCV1WTHSKEVAQASDITLELKDKKL 207 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1353> which encodes the amino acid 
sequence <SEQ ID 1354>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2717 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 83/230 (36%) , Positives = 135/230 (58%) , Gaps = 13/230 (5%) 

Query: 1 TOILEIK!^vOTSYANSKEKVLSGVNQKFEL--GKFYAIVGKSGTGKSTLLSLLAGLDKVQ 58 

M +E+K V+ SY + V + FE+ G+ I+G SG GKST+L++L G+D V 

Sbjct: 5 mFIELKQVSKSYQIGETTVFANHEVSFEINKGELWILGASGAGKSTVIJ^II^^TVD 64 

Query: 59 TGKILFKNEDIE- - -KKGYSNHRKNNISLVFQNYNLIDYLSPIENIRLVNKSVDES 111 

G+++ +DI K + +R+N I VFQ YNL+ L+ EN+ L + V ++ 

Sbjct: 65 AGQVIIDGroiAHYTSKALTQYRRI^IGFi/FQFYNLVPNLTAKENVEIAVEIVADALDPV 124 

Query: 112 -ILFELGLDKKQIKRiraKLSGGQQQRVAIARALVSDAPIILADEPTGNLDSVTAGEIIN 170 

IL E+GL + + +LSGG+QQRV+IARAL + DEPTG LD T +1+ 

Sbjct: 125 TILKEVGLSHR-LDHFPAQLSGGEQQRVSIARALAKNPKLLLCDEPTGALDYQTGKQ1LT 183 
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Query: 171 ILKELAQDRNKCVI WTHSKEVADSAD" ILELSGKKLKK- - VNKMNLEVE 218 

+L+++AQ + V++VTH+ +A AD ++ + ++ K +NK +E 
Sbjct: 184 LLQD^QTKGTTWIVTHNAAIAPIADRVIFMHDAQVTKTVIHKEPASIE 233 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 417 

A DNA sequence (GBSx0453) was identified in S.agalactiae <SEQ ID 1355> which encodes the amino 
acid sequence <SEQ ID 1356>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have an unoleavable N-term signal seq 

INTEGRAL Likelihood = -3.35 Transmembrane 17 - 33 ( 17 - 34) 

Final Results 

bacterial membrane Certainty=0 . 2338 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 418 

A DNA sequence (GBSx0454) was identified in S.agalactiae <SEQ ID 1357> which encodes the amino 
acid sequence <SEQ ID 1358>. This protein is predicted to be Vexpl. Analysis of this protein sequence 
reveals the following: 

Possible site: 56 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.52 Transmembrane 294 - 310 ( 285 - 312) 

INTEGRAL Likelihood =-10.67 Transmembrane 396 - 412 ( 385 - 417) 

INTEGRAL Likelihood = -8.76 Transmembrane 17 - 33 ( 14 - 38) 

INTEGRAL Likelihood = -4.14 Transmembrane 335 - 351 ( 333 - 357) 

Final Results 

bacterial membrane --- Certainty=0 . 5607 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD47592 GB:AF140784 Vexpl [Streptococcus pneumoniae] 
Identities = 165/425 (38%) , Positives = 271/425 (62%) , Gaps = 4/425 (0%) 



Query: 2 1KNAIAYITRKKNRTLIIFAILTIVLSCLYSCLTIMKSSNEIEKALYESSNSSISITK-K 60 

1+ + AY++RK+ R+ I+F IL ++L+ + +CLT+MKS+ +E LY+S N+S SI K + 
Sbjct: 4 IQRSWAYVSRKRLRSFILFLILLVLIUAGISACLTI^KSNKTVESNLYKSLNTSFSIKKIE 63 

Query: 61 DGKYFNINQFKNIEKIKEVEEKIFQYDGIAKLKDLKVVSGEQSINREDLSDEFKNVVSLE 120 

+G+ F ++ ++ KIK +E + + +AKLKD + V+GEQS+ R+DLS N+VSL 
Sbjct: 64 NGQTFKLSDLASVSKIKGLENVSPELETVAKLKDKEAVTGEQSVERDDLSAADNNLVSLT 123 

Query: 121 ATSNTKRNLLFSSGVFSFKEGKNIEENDKNSILWEEFAKQNKLKLGDEIDLELLDTEKS 180 

A ++ +++ F+S F+ KEG+++++ D IL+HEE AK+N L L D+I L+ +E S 
Sbjct: 124 ALEDSSKDVTFTSSAFNLKEGRHLQKGDSKKILIHEELAKKNGLSLHDKIGLDAGQSE-S 182 
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Query: 181 GKIKSHKPKIIGIFSGKKQETYTGLSEDFSEI-MVFVDYSTSQEILNKSEHNRIANKILMY 240 

GK ++ +F+IIGIFSGKKQE +TGLSSDFSEN VF DY +SQ +L SB A + Y 
Sbjct: 183 GKGQTVEFEIIGIFSGKKQEKFTGLSSDFSENQVFTDYESSQTLLGNSEAQVSAARF--Y 240 

Query: 241 SGSLESTELALNKLKDFKIDKSKYSIKKDNKAFEESLESVSGIKHIIKIMTYSIMLGGIV 300 

+ + + + ++++ ++ Y ++K+NKAFE+ +SV+ + +1 Y +++ G 
Sbjct: 241 VENPKEMDGLMKQVENLALENC3GYQVEKENK&FEQIKDSVATFQTFLTIFLYGMLIAGAG 300 

Query: 301 VLSLILILWLRERIYEIGIFLSIGTTKIQIIRQFIFELIFISIPSIISSLFLGNLLLKVI 360 

L L+L LWLRER+YE+GI L++G K I QF E++ +S+ +++ + GN + 4- 
Sbjct: 301 ALILVIiSLWLRERVYEVGILLALGKGKSSIFLQFCLEWLVSLGALLPAFVAGNAITTYL 360 

Query: 361 VEGFINSENSMIFGGSL1NKSSFMLNITTLAESYLILISIIVLSWMASSLILFKKPKEI 420 

++ + S + +L SS +1 + AESY+ L+ + LSV + + K PKEI 

Sbjct: 361 LQTLLASGDQASLQDTLAKASSLSTSILSFAESYVFLVLLSCLSVALCFLFLFRKSPKEI 420 

Query: 421 LSKIS 425 

LS IS 
Sbjct: 421 LSSIS 425 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1359> which encodes the amino acid 
sequence <SEQ ID 1360>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.57 Transmembrane 23 - 39 ( 16 - 43) 

INTEGRAL Likelihood =-11.36 Transmembrane 371 - 387 ( 362 - 396) 

INTEGRAL Likelihood = -8.12 Transmembrane 331 - 347 ( 324 - 360) 

INTEGRAL Likelihood = -7.70 Transmembrane 280 - 296 ( 277 - 308) 

Final Results 

bacterial membrane — Certainty=0. 5628 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB97962 GB:U96166 ATP-binding cassette transporter-like protein 
[Streptococcus cri status] 
Identities = 222/311 (71%) , Positives = 278/311 (89%) 

Query: 16 MRSILTMLGI I IGIGAI IAI FSI IEGNTENTKEQLIGGSNNTINIVFNKKSS IDPKFPDK 75 

MRS+LTMLGIIIGIGAIIAIFSIIEGNTENTKRQLIGGSNNTI +V++KKS+IDP P+K 
Sbjct: 1 MRSMLTMLGII IGIGAI IAIFSIIEGNTENTKRQLIGGSNNTIKVVYDKKSAIDPSIPEK 60 

Query: 76 SNAKKPDYLPFMAEEELSKIQQVKGVKNALI3YGIDDKVYHLGQKSSAKISAITKNVAEV 135 

S A+KP Y+PFM E+ LSKI+++ GVKNAL++YG D+K+Y+L QKSS+K+ A++++VA++ 
Sbjct: 61 SQAQKPSYIPFMGEDVLSKIKEIPGVKNALMTYGADEiaYYLSQKSSSKVQAVSQSVADI 120 

Query: 136 RRMTFIKGSDFSDKDFIDQKQVIYLEKSLYESLFPKDDGLGKFVEVMGNPFRVIGVFESK 195 

++ ++G F + F +Q4QV YLEKSLY++LFPK DG+GK+VEV GNPF+VIGVFES 
Sbjct: 121 KQQRLLEGEGFDSEAFKNQEQVAYLEKSLYDTLFPKGDGIGICYVEVKGNPFKVIGVFEST 180 

Query: 196 EQSGLTSGTEKIAYIPLHQWYNINGVTOATPEITIQTYRADDLKPVAKRVSDMLNQTIPK 255 

EQSGLTSG+EK+AYIPL CW+ I ++ +PE+T+QT++ADDLK VAK+VSD LNQ +P+ 
Sbjct: 181 EQSGLTSGSEKVAYIPLQQVfflRIFDTINVSPETVT^QTHKADDLKIWAKKVSDYLNQQMPQ 240 

Query: 256 SDYMFGVMNLKEFERQLDNLNKSNFVLl^GIASISLIVGGIGVMNIMLVSVTERTREIGI 315 

SDYMFGV+NL+EFERQLDNIiN+SNFVLLAGIASISL+VGGIGVMNIMLVSVTERTREIGI 
Sbjct: 241 SDYMFGVIjNLQEFERQLDNLNQSNFVLLAGIASISLLVGGIGVMNIMLVSVTERTREIGI 300 

Query: 316 KKALGARRKLI 325 

KKALGARRK++ 
Sbjct: 301 KKALGARRKIL 311 



65 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 79/386 (20%), Positives = 170/386 (43%), Gaps = 38/386 (9%) 

Query: 5 AIAyiTRKKMETLIIFAILTIVLSCLYSCLTIMKSSSIE-IEKaLYESSNSSISITKKDGK S3 

A++ I K R+++ + I + + + +I++ + E ++ L SN++I+I 
Sbjct: 7 ALSS I LSHKMRS ILTMLGI I IGIGAI I AI FS 1 1 EGNTENTKRQLIGGSNNT INI V 61 

Query: 64 YFNINQFKNIEKIKEVEEKIFQYDGLAKLKDLKWSGEQSINREDLSDEFKNVVSLEATS 123 

FN K ++ K F AK D E+ +++ KN + 

Sbjct: 62 -FN KKSSIDPK-FPDKSNAKKPDYLPFMAEEELSKIQQVKGVKNALISYGID 111 

Query: 124 NTKENLLFSSGVFSFKEGKNIEENDKNSILVHEEFAKQNKLKLGDEIDLELLDTE 178 

+ +L S KN+ E + + + +F+ ++ + I LE E 

Query: 179 KSGKIKSHKFKIIGIFSGKKQETYTGLSSDFSENOTFVDYSTSQEILNKSENNRI 233 

K ++ + F++IG+F K+Q 
Sbjct: 172 DDGLGKFVEVMGNPFRVIGVFESKEQ- - 

Query: 234 ANKILMYSGSLESTELALNICLICDFKIDKSKYSIKKDN- KAFEESLESVSGIKHIIK- - IM 290 

+ L+ Mt4 IKSY SXF! L++++ ++ I 

Sbjct: 228 ITIQTVRADDLICPVAICRVSDMLNQTIPKSDYMFGVI'INLKEFERQLDNIjNKSNFVLLAGIA 287 

Query: 291 TYSIMLGGIWLSLILILWLRERIYEIGIFLSIGTTKIQIIRQFIFELIFIS IPSI 346 

+ S4++GGI V++++L+ + ER EIGI ++G + I++QF+ E + ++ + + 
Sbjct: 288 SISLIVGGIGVMNIMLVS-VTERTREIGI10CALGARRICLILKQFLIEAVILTLLGGVIGV 346 

Query: 347 ISSLFLGNLLLKVIVE 



A related GBS gene <SEQ ID 8571> and protein <SEQ ID 8572> were also identified. Analysis of tl 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 

McG: Discrim Score: 5.59 

GvH: Signal Score (-7.5): -5.97 
Possible site: 56 

»> Seems to have an uncleavable N-terra signal seq 

ALOM program count: 4 value: -11.52 threshold: 0.0 

INTEGRAL Likelihood =-11.52 Transmembrane 294 - 310 ( 285 - 312) 
INTEGRAL Likelihood =-10.57 Transmembrane 396 - 412 ( 385 - 417) 
INTEGRAL Likelihood = -8.76 Transmembrane 17 - 33 ( 14 - 38) 
INTEGRAL Likelihood = -4.14 Transmembrane 335 - 351 ( 333 - 357) 
PERIPHERAL Likelihood =4.51 315 
modified ALOM score: 2.80 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5607 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 006o (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

38.7/67.3% over 421aa 



Streptococcus 



GP ] 5712667 | Vexpl Insert characterized 



ORF00815(304 - 1575 of 1875) 

GP|5712667|gb|AAD47592.l|AF140784_l|AF140784(4 - 425 of 425) Vexpl {Streptococcus 



%Match =25.0 

%Identity =38.7 %Similarity =67.2 

Matches = 164 Mismatches =135 Conservative Sub.s = 1 



WO 02/34771 PCT/GB01/04789 
-526- 

48 78 108 138 168 198 228 258 

SIEH*WFDNKTI*T*ELDFVSHSS**VI*DFPLNK*IRKSVTSYINGSIIEIVCQMKKF*WK*F*KH*L*AM*KY*SSG 

288 318 348 378 408 438 468 495 

5 CNSCGVKIERSN*EVIKNAmYITRKiaraT^ 

|: : ||==||= |==|=| II ::|s : =111=111= =1 11 = 1 hi II I -1= I 
MNPIQRSWAWSRI^RLRSFILFLILLVLLAGISACIjTLMKSNKTVESNLYKSLNTSFSIKKIENGQTF 
10 20 30 40 50 60 

10 525 555 585 615 645 675 705 735 

NINQFKNIEKIKEVEEKIFQYDGIiAKLKDLKWSGEQS INREDLSDEFJaT\7VSLEA.TSNTKENIiLFSSGVFSFKEGKKIE 

:: : =: ||| :| : = Hill : hlllh hill 1 = 111 I == = = = 1 = 1 h = IM = = = = 

KLSDmSVSKIKGLEWSPELETVAKLKEKEAOTGEQSVERDDLSAADl^LVSLTALEDSSKDOTFTSSAFNLKEGRHLQ 
80 90 100 110 120 130 140 

15 

765 795 825 855 885 915 945 975 

ENDKNSILVHEEFAKQNKLKLGDEIDLELIjDTEKSGKIKSHKFKIIGIFSGKKQETYTGLSSDFSENMVFVDYSTSQEIL 

: | Ihllhlhl I I hi h =1 III := =1 = 11111111111 =111)1111)1 II II =11 =1 
KGDSKKILIHEELAKKNGLSLHDKIGLDAGQSE-SGKGQTVEFEIIGIFSGKKQEKFTGLSSDFSENQVFTDYESSQTLL 
20 160 170 180 190 200 210 220 

1005 1035 1065 109S 1125 1155 1185 1215 

NKSFJSnraiANKILMYSGSLESTELALNKLKDFKID^ 

|| = | = : = = ==:=: == | ==|=|||lh =11= = =1 I === I I 1= 

25 GNSEA- -QVSAARFYVENPKEMDGLMKQVENIALENQGYQVEKENKAFEQIKDSVATFQTFLTIFLYGMLIAGAGALILV 

240 250 260 270 280 290 300 

1245 1275 1305 1335 1365 1395 1425 1455 

LILWLRERIYEIGIFLSIGTTKIQIIRQFIFELIFISIPSIISSLFLGNLLLKVIVEGFINSENSMIFGGSL1NKSSFML 

30 | |||||hlh||:h = l I I II =l = = = = h = = = == II = = = = == I = = =1 H = 

LSLWLRKRVyHVGII.r,ftLGKGKSSIFLQFCLE\n/LVSLGALLPAFVAGNAITTYLLQTLLASGDQfiSLQDTLAKaSSLST 
320 330 340 350 360 370 380 

1485 1515 1545 1575 1605 1635 1665 1695 

35 NITTI^SYT J ILTSlIVI,SVVMASSLILFKKPKEILSKIS*EOIMDILEIKimiYSYANSKEKVLSGVNQKFELGKFYAI 

' =| ::||||= 1= = III = = = = I I I I I I I II 
SILSFAESYVFLVLLSCLSVALCFLFLFRKSPKEILSSIS 
400 410 420 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 419 

A DNA sequence (GBSx0455) was identified in S.agalactiae <SEQ ID 1361> which encodes the amino 
acid sequence <SEQ ID 1362>. Analysis of this protein sequence reveals the following: 

45 Possible site: 42 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.04 Transmembrane 19 - 35 ( 14 - 42) 

Final Results 

50 bacterial membrane --- Certainty=0 . 3017 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
55 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 420 

A DNA sequence (GBSx0456) was identified in S.agalactiae <SEQ ID 1363> which encodes the amino 
acid sequence <SEQ ID 1364>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=C . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 421 

A DNA sequence (GBSx0457) was identified in S.agalactiae <SEQ ID 1365> which encodes the amino 
acid sequence <SEQ ID 1366>. Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 11 IRRVSHACTKAGDRFYEENIIaNREFTATAHNQKHCTDVTYLQYGLGAKAYLSAIKDLTOG 70 

++R R EN+LNR F A N+KW TD+TYL +G YL +1 DLYN 

Sbjct: 86 VKRKRRTWINGESR1WENLUNRNFQANKPNEKW\'TDITYLPFGT-EMLYLLSIMDLYNN 144 

Query: 71 SIIAYEISHNNEIHLL 86 

IIAYEIS+ ++ L+ 
Sbjct: 145 EIIAYEISNRQDVTLV 160 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 422 

A DNA sequence (GBSx0458) was identified in S.agalactiae <SEQ ID 1367> which encodes the amino 
acid sequence <SEQ ID 1368>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

? » Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 10 - 26 ( 10 - 26) 

Final Results 

bacterial membrane Certainty=0. 1277 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 423 

A DNA sequence (GBSx0459) was identified in S.agalactiae <SEQ ID 1369> which encodes the amino 
acid sequence <SEQ ID 1370>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0. 4170 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA56999 GE:U09558 ORFA, putative Helix-Turn-Helix motif from 
amino acid 21 through 42 and from amino acid 73 through 
99 [Lactobacillus johnsonii] 
20 , Identities = 28/116 (24%) , Positives = 59/116 (50%) , Gaps = 6/116 (5%) 

Query: 3 YSTLAKEQGVQGYLDGKGSLRDICKWYDISSRSVLQKWIKRYTSGEDLKATSRGYSRMKQ 52 

YST K + V YL+ + S++ + K Y+I +++++W+ + + L A S +++ 
Sbjct: 4 YSTELKIEIVSKYMraEDSIKGIAKQYNIHW-TLIRRWDK-AKCQGLAALSVKHTKTTY 61 

25 

Query: 63 GRQATFEERVEIVNYTIAHGKDYQAAIEKFGVSYQQIYSIWRKLEKNGSQGLVDRR 118 

+ ++ +V Y + H KF +S Q+Y+W +K + G GL+ ++ 

Sbjct: 62 SS DFKLtmmYYLTHSIGVSKVAAKFNISDSQVYNWAKKFNEEGYAGLLPKQ 113 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 424 

A DNA sequence (GBSx0460) was identified in S.agalactiae <SEQ ID 1371> which encodes the amino 
35 acid sequence <SEQ ID 1372>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.69 Transmembrane 2 - 18 { 2-19) 

40 Final Results 

bacterial membrane --- Certainty=0. 1277 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 425 

A DNA sequence (GBSx0461) was identified in S.agalactiae <SEQ ID 1373> which encodes the amino 
acid sequence <SEQ ID 1374>. This protein is predicted to be integrase (phage-relatedpr). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 28 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC79517 GB:U88974 0RF1 [Streptococcus thermophilus temperate 
15 bacteriophage O1205] 

Identities = 104/172 (60%) , Positives = 127/172 (73%) , Gaps = 11/172 (6%) 

Query: 10 QHQSYAALYLIAKTGMRFJffiCLGLTVNDIDYTNKYLSINKTWDYHFNQRYLPTKNKSSIR 69 
++ SYAALY+ 1 +KTG+RFAECLGLTV+DI LS+NKTWDY N ++PTK KSSIR 

20 Sbjct: 186 EYASYJ^YIISKTGIRFiffiCIGLTVDDIKRDTGMLSVNKTWDYKNNTGFMPTKTKSSIR 245 

Query: 70 NIPIDNDTLFFLHEFTKNKNDRLFDKLSNNAVNKTIRKITGREVRVHSLRHTFASY 125 

IP+D++ + F+ + + RL LSNNAVNKT+RKI GREVRVHSLRHT+ASY 

Sbjct: 246 EIPLDDEFINFIDQLPPTDDGRLLPSLSNNAVNKTLRKIVGREVRVHSLRHTYASYLIAH 305 

25 

Query: 126 ---LISISQVLDHEM^ITLEWAHQLQEQKDRNDKLNQRNLGRIWSKIALN 174 

LIS+SQVL HENLNITLEVYAHQLQEQK RND+ + ++W K N 

Sbjct: 306 DIDLISVSQVLGHENLNITLEVYAHQLQEQKSRNDE KIKQMWTKCGQN 353 

30 There is also homology to SEQ ID 578 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens Tor 
vaccines or diagnostics. 

Example 426 

A DNA sequence (GBSx0462) was identified in S.agalactiae <SEQ ID 1375> which encodes the amino 
35 acid sequence <SEQ ID 1 376>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm — - Certainty=0.3206 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
45 homology to SEQ ID 1328. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 427 

A DNA sequence (GBSx0463) was identified in S.agalactiae <SEQ ID 1377> which encodes the amino 
50 acid sequence <SEQ ID 1378>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
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>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 6542 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB52541 GB:AJ131519 hypothetical protein [Lactobacillus 
bacteriophage phi adh] 
Identities = 24/55 (43%) , Positives - 36/55 (64%) 

Query: 12 MDKELTPQEKANKKWAEWNREHRTYLSKRSTARSFINKNATKEDLL3LKQLIESK 66 

M K + KANKKW E N4 + Y++KRSTA4SFI AT+EDL 4+4- + 4 
Sbjct: 1 MAKITEARAKANKKWDEKNKARKLY INKRSTAICS FILNIATEEDLANI EEYVAER 55 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 



Example 428 

A DNA sequence (GBSx0464) was identified in S.agalactiae <SEQ ID 1379> which encodes the amino 
ac id sequence <SEQ ID 1380>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4417 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 1332. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 429 

A DNA sequence (GBSx0465) was identified in S.agalactiae <SEQ ID 1381> which encodes the amino 
acid sequence <SEQ ID 1382>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 430 

A DNA sequence (GBSx0466) was identified in S.agalactiae <SEQ ID 1383> which encodes the amino 
acid sequence <SEQ ID 1384>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
5 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.30 Transmembrane 205 - 221 ( 202 - 223) 
INTEGRAL Likelihood = -3.56 Transmembrane 296 - 312 ( 294 - 312) 

Final Results 

10 bacterial membrane Certainty=0 . 2720 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9663> which encodes amino acid sequence <SEQ ID 9664> 
15 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8573> and protein <SEQ ID 8574> were also identified. Analysis of this 
protein sequence reveals the following: 

20 Lipop: Possible site: -1 ' Crend: 8 

McG: Discrim Score: -8.80 
GvH: Signal Score (-7.5): -4.03 

Possible site: 47 
>» Seems to have no N-terminal signal sequence 
25 AL0M program count: 2 value: -4.30 threshold: 0.0 

INTEGRAL Likelihood = -4.30 Transmembrane 205 - 221 ( 202 - 223) 
INTEGRAL Likelihood = -3.56 Transmembrane 296 - 312 ( 294 - 312) 
PERIPHERAL Likelihood =2.97 20 
modified ALOM score: 1.36 

30 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2720 (Affirmative) < suco 

35 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

SEQ ID 8574 (GBS366) was expressed in E.coli as a GST-fusion product. The purified fusion protein 
(Figure 215, lane 5) was used to immunise mice. The resulting antiserum was used for FACS (Figure 281), 
40 which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 431 

A DNA sequence (GBSx0467) was identified in S.agalactiae <SEQ ID 1385> which encodes the amino 
45 acid sequence <SEQ ID 1386>. This protein is predicted to be N-acetylmuramoyl-L-alanine amidase. 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

50 , Final Results 

bacterial cytoplasm Certainty=0. 1471 (Affirmative) < suco 
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- Certainty=0. 0000 {Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8575> which encodes amino acid sequence <SEQ ID 8576> 

was also identified. This has an RGD motif at residues 81-83. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB0798S GB:Z93946 N-acetylmuramoyl-L-alanine amidase 
[bacteriophage Dp-1] 
Identities = 99/140 (70%), Positives = 120/140 (85%) 

IWINIEQA1AWMASRKGKVTYSMDYRNGPSSYDCSSSVYFALRSAGASDNGWAVNTEYEH 6 9 
M ++IE+ +AWM +RKG+V+YSMD+R+GP SYDCSSS+Y+ALRSAGAS GWAVNTEY H 
MGVDIEKGvAWMQARKGRVSYSMDFRDGPDSYDCSSSriYYALRSAGASSAGWAVNTEYMH 60 



Sbjct: 
Sbjct 



70 DWLIKHGYVLIAENTNWNAQRGDIFIKGKRGASAGAFGHTGMFVDPDNIIHCNYGYNSIT 129 

WLI+NGY LI+EN W+A+RGDI FIWG++GASAGA GHTGMF+D DNIIHCNY Y+ 1+ 
61 AWLIENGYELISENAPWDAKRGDIFIWGRKGASAGAGGHTGMFIDSDNIIHCNYAYDGIS 120 

130 VMNHDEIWGYNGQPYVYAYR 149 

VN+HDE W Y GQPY Y YR 
121 VNDHDERWYYAGQPYYYVYR 140 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1387> which encodes the amino acid 
sequence <SEQ ID 1388>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 79 - 95 ( 77 - 95) 

Final Results 

bacterial membrane Certainty=0 . 1426 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 56/91 (61%), Positives = 68/91 (74%) 

17 K+D F ++LD NT L NSN+PYYEAT+ DYYVESKP+ +S DKE + AGTRVR 

Sbjct: 354 KIDKPQSQLTFNQKLDTNTKLDNSNVPYYEATLRTDYYVESKPNASSADKEFIKAGTRVR 413 

Query: 218 VYEKVKGWARIGAPQSNQWVEDAYLIDATDM 248 

VYEKV GW+RI A QS+QWVED YL +AT + 
Sbjct: 414 VYEKVNGWSRINASQSDQWVEDKYLSNATQV 444 

SEQ ID 8576 (GBS301) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 9; MW 30kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 49 (lane 3; MW 55kDa). 

The GBS301-GST fusion product was purified (Figure 205, lane 4) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 300), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 432 

A DNA sequence (GBSx0468) was identified in S.agalactiae <SEQ ID 1389> which encodes the amino 
acid sequence <SEQ ID 1390>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
5 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -6.53 Transmembrane 8 - 24 ( 3-25) 

Final Results 

bacterial membrane Certainty=0. 3612 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 433 

A DNA sequence (GBSx0469) was identified in S.agalactiae <SEQ ID 1391> which encodes the amino 

acid sequence <SEQ ID 1392>. Analysis of this protein sequence reveals the following: 

20 Possible site: 34 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside — Certainty=0 .3000 (Affirmative) < suco 
25 bacterial membrane --- Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 434 

A DNA sequence (GBSx0470) was identified in S.agalactiae <SEQ ID 1393> which encodes the amino 
acid sequence <SEQ ID 1394>. Analysis of this protein sequence reveals the following: 

35 Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 0120 (Affirmative) < suco 

40 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 435 

A DNA sequence (GBSx0471) was identified in S.agalactiae <SEQ ID 1395> which encodes the amino 
acid sequence <SEQ ID 1396>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4757 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9661> which encodes amino acid sequence <SEQ ID 9662> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 436 

A DNA sequence (GBSx0472) was identified in S.agalactiae <SEQ ID 1397> which encodes the amino 
acid sequence <SEQ ID 1398>. This protein is predicted to be a minor structural protein. Analysis of this 
protein sequence reveals the following: 
Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.39 Transmembrane 349 - 3S5 ( 347 - 36S) 

Final Results 

bacterial membrane Certainty=0 . 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF43531 GB:AF145054 ORF39 [Streptococcus thermophilus 
bacteriophage 7201] 
Identities = 212/666 (31%), Positives = 323/666 (47%), Gaps = 52/666 (7%) 

WGNNLTLEILSAWNKP NIASNTSTVNVQVFL KMSSYGYISIGETRPLKITVD 61 

Wffl + W +1 +NTS V +++ L + Y + E ++ 

WSNNDRGYRIRLWVDQVGQDIQNNTSQWLFLLSLLNTTTTFAQYSCSAFVEFNGQRLNWS 64 

GRAETINVNPSINYGQRKLLFAKDYIVNHNSDGNKPLFNISAYYPIN- - FSNYGEATANQ 119 
G + N +1 L + V H DG+ +F + A++ + +S NQ 

GSPSVLGWNQTIQ LIDQTITVRHADDGSG-VFGVHAHFNGSGGWSPGNLDIGNQ 117 





10 


Sbjct: 


5 


Query: 




Sbjct: 


65 




120 


Sbjct: 


118 




180 


Sbjct: 


178 


Query: 


240 


Sbjct: 


237 



3 +GN V I+I+R TH L+Y ++ G IA VGTSY V 



TIP FAN +PN +G G + V+T 



r I+S VKV F A +G+TI Y AEIVG +NSI NG V 
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Query: 299 DFFGSA--TIRATVTDSRGLTSEPVDTKINVIDYFLPIVCSAKWRSQQNPDILQVLPFV 356 

T+R V DSRG+ S+ V+TK+ + YF P + +V RS + DIL + F 
Sbjct: 297 SVWQDTEMTLRGRVQDSRGIWSDWVETKLTFLFYFSPAL-RFEVKRSDKKLDILTIKRFA 355 

Query: 357 KIAPIIVGGIQKNQLKMSVSVAPYOT'GimVDSGAATNTWSTISQMSGAPLNLGGTYDKS 416 

KIAP+ V GIQ+N +K++ S A + VD+G A WS+IS+ + + LG +Y 

Sbjct: 356 KIAPLSWGIQR2WMKLTFSTAKVGVTONFVVDNGQAGGVWSSISEFNASDAKLGNSYPAD 415 

10 Query: 417 KSWLVKISVSDNLMSATPIIQPVASEFVLVTKAPSGVAFGXIWEHGIIDAKGDVYVDGTI 476 
S++V + D ST V ++ V++T GV GK E G 4D GD I 
Sbjct: 416 TSYWIGKLEDEFTS-TSFQATVPTDEVIMTYDRQGVGIGKYRERGALDVNGD 1 468 

Query: 477 YCGDKAIQQKPLALNNGK3SFRHDDTDI^SLQDTG?YCVFRGA1KPAGAGPGYVTVVRHET 536 
15 Y + IQQ L NNG ++ N+++D GY+FAP + + H + 

Sbjct: 469 YANNSPIQQYQLTNNNGSPKMTNNA- -NTIEDPGQYYLFSAA- -PGNPSGQWGHLFHHSS 524 

Query: 537 ANYAYQQFYDRTNKTI FTRLLENGVWSGWSEYVKKD - - SLQTTGWITIG 583 

A Q F+ + ++R++++ W W E+ + D +L TGW G 

20 Sbjct: 525 YGKGSMYKEAIQIFWSNDGRLFSRHHRWSRI IDD- -VJEPWKEFARNDNTNLINTGWQPAG 582 

Query: 584 -NGFKYKRKGDDIDLMYNFASNGLQRWSVGNMPSGI1I--PQELMFAITGWTLAPDKSIHL 640 

+G YKR GD + + +NF G + + ++P + PQ MF +TGW++ +K ++ 
Sbjct: 583 VDGSFYKRVGDVLT I KFNFTGTG - GDFLLASVPPEI FKAPQSYMFWTGWSVWANKQYNV 641 

25 

Query: 641 QINASG 646 

Q+N G 
Sbjct: 642 QVNEGG 647 

30 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1398 (GBS365) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 82 (lane 2; MW 102kDa). 

GBS365-GST was purified as shown in Figure 216, lane 11. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 437 

A DNA sequence (GBSx0473) was identified in S.agalactiae <SEQ ID 1399> which encodes the amino 
acid sequence <SEQ ID 1400>. This protein is predicted to be a minor structural protein. Analysis of this 
protein sequence reveals the following: 

40 Possible site: 59 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3481 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC34413 GB:AF158600 putative minor structural protein 
50 [Streptococcus thermophilus bacteriophage Sfilll 

Identities = 504/998 (50%), Positives = 675/998 (67%), Gaps = 56/998 (5%) 

Query: 1 MLTIHGPDLKPvIiFLDl^KQGAIiNYFNHKWYRKQKTGSSVLEFSvYKKDLLGDSPLSHKY 60 
+LTIH +L+ V ++DN+KQ LN+FN KW R ++G+SV EFSV+KK + DS + Y 
55 Sbjct: 2 LLTIHDNNLQKVAYIDNEKQSTMFFNDKi'JTRSLESC-TSVFEFSVFKKSIKSDSKVEISY 61 

Query: 61 HVIOTQAFVSFVHKGKVQLMIMKIDEDEKQIDC^^ 120 
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Sbjct: 


62 


Query: 


121 


Sbjct: 


122 


Query: 


181 


Sbjct: 


182 


Query: 


240 


Sbjct: 


241 


Query: 


296 


Sbjct: 


301 


Query: 


356 


Sbjct: 


361 


Query: 


411 


Sbjct: 


421 


Query: 


465 


Sbjct: 


479 


Query: 


525 


Sbjct: 


539 




585 


Sbjct: 


598 


Query: 


645 


Sbjct: 


658 


Query: 
Sbjct: 


705 
718 


Query: 


765 


Sbjct: 


778 


Query: 


825 


Sbj ct: 


838 




885 


Sbjct: 


898 




945 


Sbjct: 


926 



LT+G NEV D+K TLSK QET LARL+S+A NFDAEIEF+T+L 



- ++N+YK Y+ GK+ GV R ++DVIL+Y IQJI+GI+++VDK QIYN I PYG+K 



SVDDF+KDYTYLLA Q +Y V GK+NI +YTKGLFR GGA YDYAAAGY 



M +IRNGIN+ GNIL+ +-D LW+ P IT N 



LK G I N++A T+ WGH II S + + VTVLEQN+ GR YW+NSY 



+++ ++T+CYP E+ +G +V G T 



L+S M +L +E-rIPYE+K]j+T NG FKN TG+SVL 



KNG+ Y+ ++F+KNGDS+I G L+VK +DF + L +TVERYL++ELVAS +I+FTD 



--QGPKGDDGVS PINLI IESSNGYQFK 925 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1401> which encodes the amino acid 
sequence <SEQ ID 1402>. Analysis of this protein sequence reveals the following: 
Possible site: 37 
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Final Results 

bacterial membrane Certainty=0. 2423 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 23/55 (41%) , Positives = 27/55 (48%) 

Query: 886 GADGKDGAPGPQGPEGVNGLQGPKGDQGIQGPAGADGKATYTHIAYALDENGSTG 940 

G GKDGAPG G PG G +G +G+ G QGP G G+ T G G 

Sbjct: 181 GEAGKDGAPGKDGAPGEKGEKGDRGETGAQGPVGPQGEKGETGAQGPAGPQGEAG 235 
Identities = 48/151 (31%), Positives = 58/151 (37%) , Gaps = 19/151 (12%) 

Query: 852 KASDFNHVliNITVEAYLNE--ELVASTQISFTDTEDGADGKDGAPGPQGPPGvNGLCGPK 909 

KDFL ELE+L++I +GGG GPQG G G QGPK 

Sbjct: 82 KEEDFQKELKDFTEKRLKEILDL1GKSGIK GDRGETGPAGPAGPQGKTGERGAQGPK 138 

Query: 910 GD QG1QGPAGADGKATYTHIAYALDENGSTGFS VSDNVGKTYIGMYVDDNIID 962 

GD QG1QG AG G+ E G G + GK D 
Sbjct: 139 GDRGEQG1QGKAGEKGERGEKGDKGETGERGEKGEAGIQGPQGEAGK DGAPGK 191 



Identities = 25/50 (50%) , Positives = 29/50 (58%) , Gaps = 9/50 (18%) 

Query: 884 EDGADGKDGAPGPQGPPGVNGL QGPKGDQGIQGPAGADGKA 924 

+DGA GKDGAPG +G G G QG KG+ G QGPAG G+A 

Sbjct: 185 KDGAPGKDGAPGEKGEKGDRGETGAQGPVGPQGEKGETGAQGPAGPQGEA 234 

SEQ ID 1400 was expressed in four different forms. SDS-PAGE analysis of total cell extract is shown in 
Figure 122 (GBS105dN - lane 5 & 7; MW 102kDa), Figure 122 (GBS105dC - lane 8-10; MW 81kDa), 
Figure 179 (GBS105d - lane 8; MW 102kDa) and in Figure 181 (GBS105C - lane 2; MW 56kDa). 
GBS105dN-His was purified as shown in Figure 232 (lanes 9 & 10). GBS105dC-His was purified as shown 
in Figure 233 (lanes 3 & 4). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 438 

A DNA sequence (GBSx0474) was identified in S.agalactiae <SEQ ID 1403> which encodes the amino 
acid sequence <SEQ ID 1404>. This protein is predicted to be a minor structural protein. Analysis of this 
protein sequence reveals the following: 

3 N- terminal signal 



Final Results 

bacterial cytoplasm Certainty=Q . 2502 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

?GP:AAC34412 GB:AF158600 putative minor structural protein 
[Streptococcus thermophilus bacteriophage Sfill] 
Identities = 163/433 (37%) , Positives = 244/433 (55%) , Gaps = 21/433 (4%) 
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Query: 80 LSSKKPKMLMFSHIPGRYYLAVQVGDLNPKEIKMNGFGEIT--FIVADAYflHSTSYRRIK 137 

h +KK L P RYYLA+ G+++ K I + + E T F+V D AHST+Y+R+ 

Sbjct: 93 LHTKKAVKLFLPTEPERYYLALVKGEVSLKGIS-DWYDEATIEFLVPDGVAHSTTYKRVT 151 

Query: 138 DYTQDGNKMTFKIKKNGTAPAFPIFRIKHLGENGYIGITK2TGAFAVGSPEEEDGTIVHR 197 

DY + KM F I N G+ A+PI +K ENGY G+ ++ AF G+ EE DG 1+ + 
Sbjct: 152 DYQEKDGKMIFSIDNEGSTDAYPIITLKAHAENGYYGLVSDKFAFEAGNIEEADGKIISK 211 

Query: 198 NETLFDY-SKAIAQAL-EGAPWAKLNYMPPTFDSELKRMRLDNILGSGKGGEYVAIGAR 255 

E L+D+ I QA +GA NV N + + + N+ G IG + 

Sbjct: 212 AEVLYDFRDDRIPQAFAKGAKNVGITNVTGDLHGT LEIQNVWGRPH- IGLK 261 

Query: 256 GTTPGYGE - HVGTRTFI INPDSNGEY- TLNEHLIWKQIFIATAQDQKGFLKLCVTGENDE 313 

+ + T I PDS+G LNE++WW+QIF A + Q GFLKL V+ + 
Sbjct: 262 NPNANINQLQTASLTLDIPPDSSGOTGAIlffiYIlffilRQIFWAGSISQYGFLKLTVSDADGN 321 

Query: 314 FLYGIETYKRKNGFETEYNFFALDDDGVGV7RFYKQFEFQA-DRNYHNPFSMNRSRAVEIF 372 

FLYG+ET+KR G E+EYN AD G G+RF KQ+ F A + HNPF+ R + +1 
Sbjct: 322 FLYGVETFKRSLGLESEYNALASDGYG-GFRFLKQWSFLATEYSDHMPFK1EPRGWS-DIK 379 

Query: 373 REEDKFRIYFNGAHHHVTVPSLKGKKSRKIHLAI4GTCSDSSKYIimnjFEKVNFEKMGVS 432 

RE+DK Y+ G ++ T+P +KGKKS KIHL + S ++ + F+++ + K + 

Sbjct: 380 REDDKOTFYWWGTYOTFTIPEIKGKKSAKIHLTISNI-PSKSFVTHAYFDQLLYIKTNNA 438 



Query: < 

+ +1 N+Y G +IIN E+DT++ ++ ++ ++V GS IPPGES++ V S W 
Sbjct: 439 FFEDIPMRYIQGSNLIINSEDDTLTLNNLLHLDEIVDGSLWPVIPPGESQIEWQSPWAK 498 

Query: 493 ALPDI SIDFEERY 505 

P ++I+FEER+ 
Sbjct: 499 KKPSVTI E FEERW 511 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 439 

A DNA sequence (GBSx0475) was identified in S.agalactiae <SEQ ID 1405> which encodes the amino 
acid sequence <SEQ ID 1406>. This protein is predicted to be PblA. Analysis of this protein sequence 
reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.11 Transmembrane 427 - 443 ( 424 - 445) 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = -0. 



Transmembrane 449 - 465 ( 448 - 469) 

71 Transmembrane 41 - 57 ( 38 - 57) 

37 Transmembrane 361 - 377 ( 361 - 377) 

22 Transmembrane 324 - 340 ( 324 - 340) 



Final Results 

bacterial membrane --- Certainty=0 .3845 (Affirmative) < suco 

bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG18638 GB:AY007505 PblA [Streptococcus mitis] 
Identities = 233/401 (58%) , Positives = 296/401 (73%) , Gaps = 17/401 (4%) 

Query: 1 MATNLGQAYVQIMPSAKGISGSISKTLDPEASSAGSSAGSLLGGKLIGILGSVIAAAKIG 60 
MAT + QAYVQ++PSA+GI+G I L+PEAS+AG SAG LG L+G++ VIAAA IG 
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Sbjct: 


1 




Query: 


61 


5 


Sbjct: 


61 




Query: 


121 


10 


Sbjct: 


117 




Query: 


181 




Sbjct: 


177 


15 


Query: 
Sbjct: 


241 
237 


20 


Query: 


301 




Sbjct: 


297 






361 


25 


Sbjct: 347 
Identities 




Query: 


235 


30 


Sbjct: 








291 


35 


Sbjct: 


640 






351 




Sbjct: 


699 


40 




411 




Sbjct: 


756 


45 




468 




Sbjct: 


813 








50 


Sbjct: 


873 






585 


55 


Sbjct: 


933 



-539- 

MATKIAQAYVQLIPSARGITGKIQSlLNPEaSAAGQSAGQSLGSSLVGVMTKVIAAAGIG 60 



SASLLQSLGGDT KAA+ ANMAMIDM+DN+NKMGTSMESIQ AYQGFAKQNYTMLDNLKL 



GYGGT++EM+RLL+DA+KLTG KYDI+NLSDVY AIHAIQ + ITGTTAKEAA+TF+GS 



FE+MKAA++N+LGK4-ALGE+I PSL AL TTS F+ +NF+PM+ NVF G G V++ 



+ AT ++N+ D I I ++ 

FSEEAATQIWIADNIRVTFENIGSAIGDWGIVGDFVGDL 387 

! = 112/386 (29%), Positives = 172/386 (44%), Gaps = 18/385 (4%) 

TTFTGSFEAMKAASKNLLGKMA-LGEDIKPSLKA LFDTTSNFVIJSJNFIPMLTNVFKG 290 

TT+ E++KA ++ + L E IK + L T V+ FI U++ 

TTWHAYVESLKAMWNAVvTFFSDLWESIKEAASTAVWLITTATO 539 



A A S Ii K L LG 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1407> which encodes the amino acid 

sequence <SEQ ID 1408>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.76 Transmembrane 458 - 474 ( 458 - 474) 

INTEGRAL Likelihood = -2.60 Transmembrane 483 - 499 ( 482 - 499) 

INTEGRAL Likelihood = -2.02 Transmembrane 429 - 445 ( 429 - 445) 

INTEGRAL Likelihood = -1.28 Transmembrane 397 - 413 ( 397 - 413) 

INTEGRAL Likelihood = -0.53 Transmembrane 739 - 755 ( 738 - 755) 

INTEGRAL Likelihood = -0.27 Transmembrane 356 - 372 ( 356 - 372) 
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Final Results 

bacterial membrane --- Certainty=0. 2105 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the databases: 

>GP:AAB18717 GB:U38906 ORF42 [Bacteriophage rlt] 
Identities = 261/579 (45%) , Positives = 359/579 (61%) , Gaps = 63/579 (10%) 

10 Query: 184 MKRLLSDAEKLPAAMGKKFDLSNYADWEAIHLVQDNMGIAGVAAEERKTTFSGSLAAMK 243 
M+RLL+DA+KL G+K+D+SN++D+ +AIH +Q M I G A+FA TTFSGS +MK 
Sbjct: 1 MQRLLTDAQKLT GQKYDISNFSDITQAIHAIQTEMDITGTTAKEASTTFSGSFDSMK 57 

Query: 244 SSFTNWIAGLSLGDDIRPALRGIjAETTSNFLFGNFIPMVANIFKGLPSAIGTFIGAAAPI 303 
15 ++ +NV+ LSLG D++ L L TTS FLF NFIPMV NIFK LP AI TF+ AA 

Sbjct: 58 AAMSNVLGNLSLGRDLQGPLNALVSTTSTFLFKNFIPMVGNIFKALPGAISTFVSAAGKE 117 



Query: 304 ITSQ FQGLMSSLG- IS1DLSPIT 325 

++SQ F L+SS+G IS + + 

Sbjct: 118 LSSQLGNGIGSGFSDFTAKFSSILSPLCGSFQTIVSGLKPVFDSLLSSIGPISTQIMGVF 177 

Query: 326 AKFAQIGQNLQ PVFNGLKTAFSQLPSFFTSIGSAVAPVIDTIISGIARLDFSGFEA 381 

+K Q+ N+ PV + L AF QLPS F +1 AV P+IDTI SG++RLDFSG +A 

Sbjct: 178 SKLPQLFSNVISAVIPVISTLSVAFGQLPSLFFAISVAVQPMIDTISSGISRLDFSGIQA 237 

Query: 382 LISAIbPALQAGFSNFAAIVGPAISGWDSFVGMWNAAQPLISILSDALMPVFQILGSFL 441 

+ ISA++PA+ G + I+GP+I +V+SFV MWN+ QPL ++++ ALMP FQ+LG+F+ 
Sbjct: 238 I ISALVPAITTGITTMMGI IGPS IDTJjVKS FVKMWNS IQPI ATVIAGALMPAFQVLGAFI 297 

Query: 442 GGWKGALMGVSFAFDAVKVAIQLVTPIIDLIjVQGLNFVQPvIiSVIAEWIGVAIGMFGNL 501 

GGV+KGA++ +S FD ++V + +TPII ++ PVL+ +A+W+G AIG F N 

Sbjct: 298 GGVTjKGAMLALSATFDTIRVWGFLTPIIAAVIiAKFQEFAPVLATVAQWVGTAIGFFANF 357 

Query: 502 GTAGQGLSAFIKSAWTNIOTAISTAGTIISWIDYIKIAFSGAGSAVGvIKNIFSLAWMA 561 

G AG L I SAW I++ IS+ +1 +1+ K F+G GSA G L+++ S AW 
Sbjct: 358 GAAGTSLKGLITSAftNGIKSIISSVVSGIGGIINTAKAIFTGLGSAGGALRSMISGAWSG 417 

Query: 562 MGDAINVAKGI ISSVINGIKSAFSSFS SLVSSVGSAVNGVIDSISSTIRG--- 611 

+ 1+ G IS INGIKS FSS S++S V S + G+I SSTI G 

Sbjct: 418 IRSIISSVGGSISGTINGIKSFFSSLGGSGNGLRSVMSGVWSGITGIISGASSTISGIID 477 

Query: 612 LANIDISGAGAAIMNGFLNGLKSAWGAVKSFVSGIANWIAEHKGPISYDRVL 663 

L NID++GAG A+++GF+ GLKS W A K FV GIA+WI +HKGPISYDR + 
Sbjct: 478 GIKNIFNSLKNIDLAGAGRAVIDGFVGGLKSTWEAGKKFVGGIADWIKDHKGPISYDRKI 537 

Query: 664 LKPAGKAIMGGLNTSLIDGFKEVKSNVSGMADDLASTMT 702 

L PAG+AIMGG N SL++ FK V+ NVSG+A + S +T 
Sbjct: 538 LIPAGQAIMGGFNDSLMENFKAVQKNVSGIAKQIQSAIT 575 



50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 272/701 (38%), Positives = 371/701 (52%), Gaps = 91/701 (12%) 





1 


MATNLGQAYVQIMPSAKGISGSISKTLDPEASSAGSSAG3LLGGKLIGILGSVIAAAKIG 


60 






MAT LGQAYVQIMPSA+GISG+ISK LDPEA SAG SAGSL+GG L+ ++G I AAA IG 




Sbjct: 


1 


MATELGQAYVQIMPSARGISGAISKQLDPEARSAGLSAGSLIGGNLVKMIGGAIAAAGIG 


60 




61 


EMVTKAISSSISEGAALQQSLGGVETLFKSimNLVKKYADEAYKTTGLSANAWlESVTGF 


120 






+M ISS++S GA LQQS GG++TL+K VK +A EAYK G+SAN Y E 




Sb j ct : 


61 


KM ISSALSAGADLQQSFGGIDTLVKGAETAVKGFAKEAYKA-GISANTYAEQAVSM 


115 


Query: 


121 


SASLLQSLGGDTAKAAKVAi^MAMIDMADNSNKMGTSMESIQYAYQGFAKQErYTMLDl^KL 


180 




ASL QSLGGD AAK ANMA+ +DMADNS KMGT + SIQ AYQGFAKQNYTMLDNL+L 




Sbjct: 


116 


GASLKQSLGGDAVAAAKAAM^IMDMADNSAKMGTDITSIQMAYQGFAKQNYTMLDNLRL 


175 




181 


GYGGTQEEMKRLLSDAQKL- - - TGKKYD I SNLSDVYEAIHAIQGKIG I TGTTAKEAATT F 


237 




■GYGGT+EEMKRLLSDA+KL GKK+D+SN +DV EAIH +Q +GI G A+EA TTF 
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Sbjct: 176 GYGGTKEEMKRLLSDAEKBPAaMGKKFDLSNYADVVEAIHLVQDNMGIAGVAAEEAKTTF 235 

Query: 238 TGSFEAMKAASKNLLGKMMjGEDIKPSLKALFDTTSWFVLIOTFIPMLTNVFKGFGSVISL 297 

+GS AMK++ N+4- ++LG+DI+P+L+ L +TTSNF+ NFIPM+ N+FKG S I 
Sbjct: 236 SGSriAAMKSSFTNVMAGLSLGDDIRPALRGLAETTSNFLFGNFIPMVANIFKGLPSAIGT 295 

Query: 298 TFSELIPKIV GFMQTSGPSLMQSGISFIISFV NGFLTAY PAFLTV 342 

PI G M + G S+ S 1+ + + NG TA+ P+F T 

Sbjct: 296 FIGAAAPIITSQFQGLMSSLGISIDLSPITAKFAQIGQNLQPVFNGLKTAFSQLPSFFTS 355 

Query: 343 AGKIFTDFVSFVMQSIPGL LQAGATIiVLNLIDGILANLPQIATSAVS-VISSFISM 397 

G + ++ + L +A + +L + +N I A+S V+ SF+ M 

Sbjct: 356 IGSAVAPVIDTIISGIARLDFSGFEALISAILPALQAGFSNFAAIVGPAISGVVDSFVGM 415 

Query: 398 LQANYPAI LKKGFEILSYLVQGI IARLPDIVIT 430 

API L F+IL + G+ + + D+++ 

Sbjct: 416 WHAAQPL I S I LSDALMPVFQ ILGSFLGG WKGALMG VS FAFDAVKVAIQLVTP I IDLLVQ 475 

Query: 431 VGKLIAIIAGAIASNLPKVIALGV--QLLITFVKGILSVIGKINETANNIGEKLIN 484 

V +++++A I + LG D L F+K +1 TA I +1+ 

Sbjct: 476 GLNFVQPVLSVIAEWIGVAIGMFGNLGTAGQGLSAFIKSAWTNIQTAISTAGTIISTVID 535 

Query: 485 AIKSI DLLSAGRAIMRGFLRGLEDWGDIQNFVGDIA 521 

IK D ++ + 1+ + G++ + + V + 

Sbjct: 536 YIKLAFSGAGSAVGVLKNIFSLAWMAMGDAINVAKGIISSVINGIKSAFSSFSSLVSSVG 595 

Query: 522 GWIKDHKGPISYDRRLLI PAGNAIMQGLHQG1VDKFKPVKNLVNGMAEEIQSSFG 576 

+ IS R L AG AIM G GL + VK+ V+G+A I G 

Sbjct: 596 SAVNGVIDSISSTIRGLANIDISGAGAAIMNGFIiNGLKSAWGAVKSFVSGIANMIAEHKG 655 

Query: 577 NPQLAFDMDTKVNNGFERIGTRIKNLSSQVTSTDNYTSGNA 617 

+++D G +G LN +L + SG A 

Sbjct: 656 --PISYDRVLLKPAGKAIMGGLNTSLIDGFKEVKSNVSGMA 694 

i, it was predicted that these proteins and their epitopes could he useful antigens for 



Example 440 

A DNA sequence (GBSx0477) was identified in S.agalactiae <SEQ ID 1409> which encodes the amino 
acid sequence <SEQ ID 141 0>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0 .2565 (Affirmative) <; suco 

bacterial membrane — Certainty=0 . OOOO (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG18637 GB:AY007505 unknown [Streptococcus mitis] 
Identities, = 64/119 (53%) , Positives = 87/119 (72%) , Gaps = 2/119 (1%) 

Query: 1 MLKMDEDALVCDLAETYHIYDYKQLPPLK'/AVFSLGLREESRINRVISGNRVSFERRIIiA 60 

M++ DEDAL+ CDLAETY I+DY+QLP +VAVF+ GLR++SRI ++ ++V FE +LA 
Sbjct: 1 MIQTDEDALICDLAETYGIFDYRQLPADQVAVFAFGLRDDSRIKLAMTNSKVPFETFLIiA 60 

Query: 61 GMFDRLGMLIWMKTTDGQKGKNRPEMVSTMF- -DNQQKDSEWSFGSGKDFEETRNNIL 117 
G+ DRL L+W KTTDGQKG N+P MV+ + K+S+ + F SG+DFEE R IL 

' Sbjct: 61 GVLDRLSALVWFKTTDGQKGINKPUWTEELTGKTKAKESKEyilFDSGEDFEEYRQKIL 119 

A related DNA sequence was identified in S.pyogenes <SEQ ID 141 1> which encodes the amino acid 
sequence <SEQ ID 141 2>. Analysis of this protein sequence reveals the following: 
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3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2905 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 60/123 (48%), Positives = 82/123 (65%), Gaps = 2/123 (1%) 

Query: 1 MLKMDEDALVCDLAETYHIYDYKQLPPLKVAVFSLGLREESRINRVISGNRVSFERRILA 60 

M+ D+DAL CDLAETY IYDY+QLP +VAVF++GLR SRI +SG + + +LA 
Sbjct: 1 MIAKDDDALTCDIAETYGIYDYRQLPAYQVAVFAVGLRSNSRIKMALSGETEALDTVLLA 60 

Query: 61 GMFDRLGMLIWMKTTDGQKGKNRPEMV- - STMFDNQQKDSEWSFGSGKDFEETRNNILG 118 

G++D +L W KT DGQ G+N+P+ V + QK 4+V+SF SG+DFE R +LG 

Sbjct: 61 GIYDNTNLLFWSKTKDGQSGQNKPKSWEAISGSKBQKANDVISFVSGEDFENARKQLLG 120 

Query: 119 FGG 121 

G 

Sbjct: 121 GDG 123 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 441 

A DNA sequence (GBSx0478) was identified in S.agalactiae <SEQ ID 1413> which encodes the amino 
acid sequence <SEQ ID 1414>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2280 (Affirmative) c suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:ARG18636 GB:AY007505 unknown [Streptococcus mitis] 
Identities = 40/80 (50%), Positives = 62/80 (77%), Gaps = 1/80 (1%) 

Query: 3 TSSGFEYKIEESRLKNYELVFjy^LESNPLSLPKVLRDLLGDQVESLKNHLRASDGTVS 62 

TS+GF ++I + RL+NYEL+EA++++++NP LPK7++L+LG++ E LKNH+R +DG V 
Sbjct: 24 TSTGFPFEITKERLENYELLEAISE^/DTNPAvLPKvVKLMLGNKSEDLKNHVRTADGIVP 83 



Query: 63 

+ + E+ EIF S QLKK 
Sbjct: 84 LDKMGAE I SEI FS SQNQLKK 103 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1415> which encodes the amino acid 
sequence <SEQ ID 141 6>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certalnty=0. 4365 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 42/75 (56%) , Positives = 60/75 (80%) 

Query: 2 KTSSGFEYKIEESRDKISrfELvEflLADLESNPLSLPKVIiRLLLGDQVESLKmiLI^SrJSTV 61 

KT+SGFEY+I + RLKN+ELVEA+A+ E++P ++ K++ LLLGD +SLK H+R ++G V 
Sbjct: 7 KTTSGFEYEIPKKRLKNFELVEAIAEEETDPTAWKIVNLLLGDAAKSLKEHVRDAEGIV 66 

Query: 62 STEALMEEVKEIFES 76 

EA+ E+KEIFES 
Sbjct: 67 DVEAIGVEIKEIFES 81 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 



Example 442 

A DNA sequence (GBSx0479) was identified in S.agalactiae <SEQ ID 1417> which encodes the amino 
acid sequence <SEQ ID 141 8>. This protein is predicted to he Structural protein. Analysis of this protein 
sequence reveals the following: 
Possible site: 44 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3461 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty-0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



+A +NVTTAKPKIGGA+Y+APLGT LP D ++L++AF++LGYIS+DG++N + ESE 
Sbjct: 1 MATEANVTTAKPKIGGAVYSAPLGTALPTDATTKLDQAFEALGYISDDGMTNSNSPESEN 60 

Query: 62 IQAWGGDVVESAQKSKADKFTYTLIEALNIEvLKEIYGKDNVTGDLKTGITVKSNSKPLE 121 

I+AWGG W S QK K D FY LIEALN+ VLKE+YG DNV+GDL +GIT+K+NSK L 
Sbjct: 61 IKAWGGVWSSVQKEKTDTFKYMLIEALNLHVLKEVYGPDNVSGDLSSGITIKANSKELP 120 

Query: 122 EHCLVIEMILraTOWRIVIPKGKVSEVGEIia'VDI^EAaGYETTLQAFPDAEGNTHYEYI 181 

HCLVIE +LK +KRIVIP GKV+ + EI Y D GY TT+ AFP+A +THYEYI 
Sbjct: 121 HHCLVIETVLKGGVLKRIVIPSGKVTAIDEITYiTOGSVLGYGTTVTAFPKAADDTHYEYI 180 

Query: 182 KGA 184 
KGA 

Sbjct: 181 KGA 183 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1419> which encodes the amino acid 
sequence <SEQ ID 1420>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no H-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2379 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 
Identities = 119/182 (65%) , Positives = 142/182 (77%) 
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Query: 4 NSSNVTTAKPKIGGAIYTAPLGTELPmTASELNEAFKSLGyiSEDGLSNEDKRESEEIQ 63 

++ NVT+AKPK GGAIY+APLGTELPKD SEKSF FK+LGY+SEDG+ NED R SE 1 + 
Sbjct; 6 DTKNVT£AICPKTGGAIYSAPLGTELPI<DAKSEIOTKFI<1 , ILGY 1 /SEDG\^/NEDTRSSENIK 55 

5 

Query: 64 AWGGDVVESAQKSKADKFTYTLIEALNIEVLKEIYGKDNVTGDBKTGITVKSKSKPLEEH 123 

AWGGD+V + Q K DKFTY LIE+LN+EVLKE+YG NVTGDL GI +KSNSK LE H 
Sbjct: 66 AWGGDIVGAVQTEKEDKFTYKLIESLNVEVLKEVYGAVNVTGDLSGGIQIKSNSKELEAH 125 

10 Query: 124 CLVIEMILKNNTVKRIVIPKGKVSEVGEIKYVDNEAAGYETTLQAFPDAEGNTHYEYIKG 183 

+V4+MI+ +KRIV+P KV EVGEIKYVD E GYETTL+ FPD +G+TH EYI 
Sbjct: 126 VIVVDM^GGILKRIVLPNAKVDEVGEIKYVDGEWGYETTLKCFPDKDGDTHREYIVK 185 

Query: 184 AG 185 
15 G 

Sbjct: 186 PG 187 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 443 

A DNA sequence (GBSx0480) was identified in S.agalactiae <SEQ ID 1421> which encodes the amino 
acid sequence <SEQ ID 1422>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

25 ■ 

Final Results 

bacterial cytoplasm Certainty=0 . 2214 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

30 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18710 GB:U38906 ORF35 [Bacteriophage rlt] 
Identities = 52/78 (66%) , Positives' = 66/78 (83%) 

35 Query: 1 MSKFKFKENEQ.GVAELMKSSEMQQVLTTKATAIRERCGDGYAQDIHVGKNRANAMVSAKT 60 

M+K FKLN++GVA +MKS EMQ +Ii KA+A+++RCG GY QD+HVGKNRANAMV A+T 
Sbjct: 1 MAKNLFKLNRSGVAS^KSPEMQAILKEKASAVKQRCGPGYGQDMHVGKNRANAMVFAET SO 

Query: 61 IKAKKDNSKKNTLLKAVR 78 
40 +AK+DN KNNT+LKAVR 

Sbjct: 61 YQAKRDNMKNNT1LKAVR 78 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1423> which encodes the amino acid 
sequence <SEQ ID 1424>. Analysis of this protein sequence reveals the following: 

45 Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2446 (Affirmative) < suco 

50 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 75/78 (96%), Positives = 76/78 (97%) 

55 

Query: 1 MSKFKFKMKAGVAELMKSSEMQQVLTTKATAIRERCGDGYAQDIHVGKNRANAMVSAKT 60 

MSKFKFKLN+AGVAELMiffiSEMQQVLTTKATAIRERCGDGY QDIHVGKNRANAMVS KT 
Sbjct: 1 MSKFKFKlMlAGVAELMKSSEMQQWTKATArRERraDGYVQDIHVGKNRANAWSTKT 60 
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Query: 61 I KAKKDNSKNNTLLKAVR 78 

I KAKKDNSKNNTLLKAVR 
Sbjct: 61 IKAKKDNSKNNTLLKAVR 78 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 444 

A DNA sequence (GBSx0481) was identified in S.agalactiae <SEQ ID 1425> which encodes the amino 
acid sequence <SEQ ID 1426>. Analysis of this protein sequence reveals the following: 

10 Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2888 (Affirmative) < suco 

15 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18709 GB:U38906 0RF34 [Bacteriophage rlt] 
20 Identities = 41/59 (69%) , Positives = 45/59 (75%) 

Query: 1 MTGKKVEYILAI PKGDKHDWEDKEVCFFDKKWRTVGIjALEGIEELI PLEWNKKVMVERY 59 

+TGKK Y LfllPK D HDWE+K+V FF K WRT G LEGIE LIPL+WNKKV VE Y 
Sbjct: 56 LTGia<AIYTLAIPKia3THDWENia<TOFFGKTKRTFGEPLEGIEGLIPLDWNKKUWEHY 114 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1427> which encodes the amino acid 
sequence <SEQ ID 1428>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 2779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 51/60 (85%) , Positives = 57/60 (95%) 

Query: 1 MTGKKVEYII^IPKGDKHDWEDKEVCFFDKKWRWGIjALEGIEELIPIjEWNKKVMVERYE 60 
40 +TGKKVEY+LAIPKGD+HDWE+KEV FF KKWRTVG+ IiEGIEELIPL+WNKKVMVERYE 

Sbjct: 50 LTGKKVEYVIAIPKGDEHDWENKEWFFGKXWRWGIPLEGIESLIPLDVilNKlCVMVERYE 109 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 445 

A DNA sequence (GBSx0482) was identified in S.agalactiae <SEQ ID 1429> which encodes the amino 
acid sequence <SEQ ID 1430>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal seouence 

50 

Final Results 

bacterial cytoplasm Certainty=0. 2770 (Affirmative) < suco 

bacterial membrane Certainty=C . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18708 GB:U38906 ORF33 [Bacteriophage rlt] 
Identities = 89/130 (68%) , Positives = 106/130 (81%) , Gaps = 1/130 (0%) 

Query: 1 MTNFATTDDVILLWRQLSVDEIICRAEALLETVSDTLRLEASKVGKNLDEMILETP-YFAT 59 

M FAT DD+ +LWR L DE +RAE LLE VSD+LR EA KVG++L MI E P YFA+ 
Sbjct: 1 MNPFATVDDLTMLWRPLKGDEKERAEIOjLEIVSDSLREEADKVGRDLYAMIAEKPSYFAS so 

10 Query: SO VLKBVTVDIVARTLMTATQGEPMSQESQSALGYTWSGTYLVPGGGLFIKDSELKRLGLKK 119 

V+KSVTVDIVARTLMT+T EPM+Q ++SALGY+ SG+YLVPGGGLFIK+SEL RLGLKK 
Sbjct: 61 WKSVTVDIVARTLMTSTDQEPMTQTTESALGYSVSGSYLVPGGGLFIKNSELSRLGLKK 120 



Query: 120 QRYGGIELYG 129 
15 QR+G 1+ YG 

Sbjct: 121 QRFGVIDFYG 130 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1431> which encodes the amino acid 
sequence <SEQ ID 1432>. Analysis of this protein sequence reveals the following: 



Possible site: 37 

»> Seems to have no N-terminal signal sequence 



■ Final Results 

bacterial cytoplasm --- Certainty=0 . 2C61 (Affirmative) . 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < f 
bacterial outside — Certainty=0 . 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 115/138 (84%) , Positives = 129/138 (93%) 

Query: 3 NFATTDDVILLWRQLSVDEIKRAEALLETVSDTLRLEaSKVGKNLDEMILETPYFATVLK 62 

NFATTDDVILLWR LSVDE+KRA ALL+ VSDTLR+EA KVGK+LD+ +++ PYF V+K 
Sbjct: 3 NFATTDDVILLWRPLSTOELKRANALLKWSDTI^EADKVGKDLDKTMVDKPYFVNVIK 62 

Query: 63 SVTVDIVARTLMTATQGEPMSQESQSALGYTWSGTYLVPGGGLFIKDSELKRLGLKKQRY 122 

SVTVDIVARTLMT+T+GEPM+QESQSALGYTWSGTYLVPGGGLFIKDSELKRLGLKKQRY 
Sbjct: 63 SVTVDIVARTLMTSTRGEPMAQESQSALGYTWSGTYLVPGGGLFIKDSELKRLGLKKQRY 122 

Query: 123 GGIELYGEIERNNSYFSR 140 

GGIELYGEIER+NS FSR 
Sbjct: 123 GGIELYGEIERDNSCFSR 140 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



45 Example 446 

A DNA sequence (GBSx0483) was identified in S.agalactiae <SEQ ID 1433> which encodes the amino 
acid sequence <SEQ ID 1434>. This protein is predicted to be Structural protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 30 
50 »> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 3015 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succs 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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Query: 


5 


Sbjct: 


3 


Query: 


65 


Sbjct: 


63 


Query: 


125 


Sbjct: 


123 


Query: 


181 


Sbjct: 


180 




241 


Sbjct: 


239 



IKAGTLFKPELWEIMSKVKGHSTLAKLSGQTPI P?KGVEQPVFNIDGNAQIVGBGEQKL 64 
+ GTLF P LVT+++SKV G S++A+LS Q PIPFNG + F F +D +V E +K 
1OTGTLFDPTLOTDLISKVAGKSSIARLSAQKPIPFKGEKVFTFTMDSEIDWAESGKKT 62 



^ + P+K Y AR++DEF YAS+E+++N L+ + DGFAKK+A D+ A H 



++K+KD DN ++PE ++G P 



<I FKWGYA+ +P+E+I+YGDPD SG DLK YN++ +R E F+GWGILD 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1435> which encodes the amino acid 
sequence <SEQ ID 1436>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2772 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 133/298 (44%) , Positives = 187/298 (62%) , Gaps = 2/298 (0%) 

Query: 1 MAESIKAGTLFKPELVTEIMSKVKGHSTLAKLSGQTPIPFNGvEQFVFNLDGNAQIVGEG 60 

M +LF LV+++++KVKGHS+LAKLS Q PIPFNG ++F F LD + +V E 

Sbjct: 1 MGTETSKASLFDKHLVSDLINKVKGHSS1AKLSSQKPIPFNGSKEFTFTLDSDIDVVAEN 60 

Query: 61 EQKLGNTAKVTSKIIKPLKFVYQAm^TDEFKYASEEKRIMFLKHYADGFAKKMAEAFDIA 120 

+K +1 P+K Y AR++DEF YA+EE++++ LK + +GFAKK+A D+ 

Sbjct: 61 GKKTHGGLSLEPVTIVPIKVEYGARLSDEFLYATEEEKIDILKAFNEGFAKKLARGIDLM 120 

Query: 121 AIHGLEPRTMTDASFKATNSFDGWTGNVIKYEADKIDDNIDAAVTTIVANGNDVTGIAL 180 

A+HG+ PRT + TN FD VT V E++ D NI+AAV I + VTG+A+ 
Sbjct: 121 AMHGINPRTKKASDVIGTNHFDSKVTQWKFTESEDADANIEAAVNLIQGSEGVVTGLAM 180 

Query: 181 SPQAGQDMSK-RKDKFDNvMYPEFRFGQRPSNFFNMTLDINKTLTMKGGTAKD-DHAIVG 238 

+ ++K + MYPE +G P + + +N T+ A+ D I+G 

Sbjct: 181 DTEFSTALAKVTNGEMGPKMYPEL&WGANPDS INGLKSSVNTTVGAGADEAESKDLVI IG 240 

Query: 239 DFQNMFKWGYAENIPMEIIEYGDPDGSGRDLKAYNEILLRTEAFIGWGILDEKAFSRV 296 

DF++MFKWGYA+ IPMEII+YGDPD SG+DLK YN+I BR EA+IGWGILD K+F+RV 
Sbjct: 241 DFESMFKMGYAKQIPMEIIKYGDPDNSGKDLKGYNQIYLRAEAYIGWGILDAKSFARV 298 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 447 

A DNA sequence (GBSx0484) was identified in S.agalactiae <SEQ ID 1437> which encodes the amino 
acid sequence <SEQ ID 1438>. Analysis of this protein sequence reveals the following: 
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Possible site: 61 

>» Seems to have no N- terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0. 2224 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9659> which encodes amino acid sequence <SEQ ID 966Q> 
10 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18705 GB-.U38906 ORF30 [Bacteriophage rlt] 
Identities = 64/158 (40%), Positives = 101/158 (63%), Gaps = 8/158 (5%) 

15 Query: 43 MSEFKVIETQEELDTIVKARIARERE KYQDYDQLKTRVEELETENSSLQTALNDAK 98 

MSE + +TQEEL+ I++ R+AR++E + DYD+LKT++ LE +N++ Q + ++K 

Sbjct: 1 MSENNLPKTQEELNQIIETRLARQKETIEANFADYDELKTKIAALEADNTAYQATIEESK 60 

Query: 99 SNTDSYTEKITTLENQIAGYEAANLRTKVAI^YGLPIDlAmLQGDDEDGLKVnAERlAS 158 
20 S + ++ E QI+GY+ L+ +A++ GLP+DLA+RL GDDE+ LK DAER + 

Sbjct: 61 S WEQEKADYEKQISGYKTTQLKQSIAIKAGLPLDLADRLSGDDEESLKADAERFSG 116 

Query: 159 FIKPSQPQPPTKSNEPIITDQKEAGWIEMASNLVNKGE 196 
FIKP P P K EP + D K+ + ++ L +GE 
25 Sbjct: 117 FIKPKTPPAPLKDVEPNLGDGKDGAYRKLVDGLKTEGE 154 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1439> which encodes the amino acid 

sequence <SEQ ID 1440>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
30 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3475 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 128/149 (85%) , Positives = 136/149 (90%) 

40 Query: 43 MSEFKVIETQEELDTIVKARIAREREKYQDYDQLKTRVEELETENSSLQTAIM5AKSNTD 102 

MSEFKVIETQEELDTIVKARIAREREKYQDYDQLKTR\rEELETENSSLQTALNDAKSNTD 
Sbjct: 1 MSEFKVIETQEELDTIVKARIAREREKYQDYDQLKTRVEELETENSSLQTALNDAI<SNTD 60 

Query: 103 SYTEKITTLENQIAGYEAANLRTKVALQYGLPIDIjANRLQGDDEDGLKVDAERIiASFIKP 162 
45 SYTE+I+TL+NQIA YE ANLRTKVALQYGL P IDLA+RLQGDDEDGLKVDAERLAS FIKP 

Sbjct: 61 SYTEEISTLKNQIADYETANI^TKVALQYGLPIDLADRLCGDDEDGLKVDAERIiASFIKP 120 

Query: 163 SQPQPPTKSNEPI ITDQKEAGWIEMARNL 191 
SQPQPP KSNEP I +A 4 + + I> 
50 Sbjct: 121 SQPQPPAKSNEPNIDSNADANYRALVQGL 149 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 448 

55 A DNA sequence (GBSx0485) was identified in S.agalactiae <SEQ ID 144 1> which encodes the amino 
acid sequence <SEQ ID 1442>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
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»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2888 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Mot Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 8 KLGNQRPTQSWLHFAKTIAHEAINYYKKTGLSCYLKQEMvlLIPMMAINEDNLWVHQKYG 67 

+ GNQ PTQSV L F +T EAI V+K+ CY KQ+N+L +MAI+ED LW HQK+G 
Sbjct: 6 RFGKQypTQSVILPFTETKYQEAIEIYEKSKHECYPKQKNLLKEVMAIDEDGLWTHQKFG 65 

Query: 68 YAIPRRNGKTEWYILELWALHKGLKILHTAHRISTSHSSFEKVKKYLEMSGYVDGEDFI 127 

Y+IPRRNGKTE+VYILELW+L +GL ILHTAHRISTSHSS+EK+KKYLE SGYV+GEDF 
Sbjct: 66 YSIPRRNGKTEIWILELWSLVQGLSILHTAHRISTSHSSYEKLKKYLEDSGYVEGEDFK 125 

Query: 128 SNKAKGQERIEFKSSGSVIQFRTRTSNGGLGEGFDLLIIDEAQEYTAEQESALKYTVTDS 187 

S KAKGQER+E SG VIQFRTRTS+GGLGEGFD+L+ IDEAQEYT EQESALKYTVTDS 
Sbjct: 126 SIKAKGQERLELIESGGVIQFRTRTSSGGLGEGFDILVIDEAQEYTTEQESALKYTVTDS 185 

Query: 188 DNPMTIMCGTPPTMVSTGTVFESYRKECLKGDRRYSGWAEWSVDEMQPIHDVKSWYVANP 247 

DNPMTIMCGTPPT VS+GTVF +YR + G +YSGWAEWSV++++ IHDV++WY 4-NP 
Sbjct: 186 DNPMTIMCGTPPTPVSSGTVFTNYRDOTIAGKRKYSGWAEWSvEDVTOIHDVEAWYNSNP 245 

Query: 248 SMGYHLNERKIEAELGEDEIDHNIQRLGYWPSFNQKSVISEKEWAKLKVEQVPELKSKLF 307 

SMGYHLNERKIEAELGED++DHN+QRLGYWP +NQKSVISE+EW LKV ++P +K KLF 
Sbjct: 246 SMGYHI^RKIEAELGEDKLDHNVQRLGYWPKYNQKSVISEQEVJNALKMKLPVIKGICLF 3 05 

Query: 308 VGI KFGQDGNNVSLS IAARASENKVFVEAIDCLSVRNGTQWI INFLKSADIAKWVDGAS 367 

VGIK+G DG NV++SIA + KVFVE IDC S+RNG QWIINFLK AD+ KW+DG S 
Sbjct: 306 VGIKYGNDGA1WAMSIAVKTLSGKVFVETIDCQS1RNGNQWIINFLKKADVEKWIDGQS 365 

Query: 368 GQELLAQEMREHGLKKPELPKVAEI ITftNTMWEQGIMQETI CHNDQPSLTAWTNCEKRQ 427 

GQ +L EM++ LK+P LP V EII AN++WEQGI Q+ CH+ QPSL+ WTNC+KR 
Sbjct: 366 GQSILTSEMKDFKLKEPILPTVKEIINANSLWEQCIFQKNFCHSGQPSLSTVVTNCDKRN 425 

Query: 428 IGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQR 468 

IG++GGFGYKS +DD DISLMDSALLAHW C KPK+KQ+ 
Sbjct: 426 IGTSGGFGYKSQFDDMDISLMDSALLAHWACSNNKPKKKQQ 466 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1443> which encodes the amino acid 
sequence <SEQ ID 1444>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3133 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 437/471 (92%) , Positives = 459/471 (96%) 

Query: 1 MVTKTKAKLGNQRPTQSVNLHFAKTIAHEAINYYKICT^ 60 

MVTKTK KLGNQRPTQSVNLHEAK+LAHEAIIfyyKiCrGLSCY WQ NMLIP+M&I+E+ I, 
Sbjct: 6 MVTKTKTKLGNQRPTQSVNLHFAKSIAHEAINYYKKTGLSCYPWQVNMLIPIMAIDENGL 65 

Query: 61 WVHQKYGYAIPRRNGKTEWYILELWALHKGLKILHTAHRISTSHSSFEKVKKYLEMSGY 120 

WVHQKYGYAIPRRNGKTEWYI++LWALHKGLKILHTAHRISTSH+SFEKVKKYLEMSGY 
Sbjct: 66 WVHQKYGYAIPRRNGKTEVVYIVQLWALHKGLKIIjHrAHRISTSHftSFEKVKKYLEMSGY 125 
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Query: 121 VDGEDFISNKAKGQERIEFKSSGSVIQFRTRTSNGGLC-EGFDLLIIDEAQEYTAEQESAL 180 

VDGEDFISHKAKGQERIEFK+SG+VIQFRTRTSNGGLGEGFDLL1IDEZVQEYT+EQESAL 
Sbjct: 126 VDGEDFISNKAKGQERIEFKASGAVIQFRTRTSNGGLGEGFDLLIIDEAQEYTSEQESAL 185 

Query: 181 KTTVTDSDNPMTIMCGTPPTMVSTGTVFESYRKECLKGDRRYSGWAEWSVDEMQPIHDVK 240 

KYTVTDSDNPMTIMCGTPPTMVSTGTVFE+YRK+CLKG++RYSGWAEWSV EM I+DV 
Sbjct: 186 KYTVTDSDNPMTIMCGTPPTMVSTGTVFEAYRKIXJLKGIIKRYSGWAEWSVPEMVKINDVS 245 

Query: 241 SWYVJWPSMGYHKNERKIEAELGEDEIDHNIQRU3YWPSFNQKSVISEKEWAKLKUEQVP 300 

SWY++NPSMG+HLNERKIERELGEDEIDH1<IIQRLGYWPSFNQKSVISEKEWAKLKVEQVP 
Sbjct: 246 SWYISNPSMGFHL^RKIEAELGEDEIDHNIQRLGYWPSFNQKSVISEKEWAKLKVEQVP 305 

Query: 301 ELKSKLFVGIKFGQDGNNVSLSIAARASENKVFVEAIDCLSVRNGTQWIINFLKSADIAK 360 

ELKSKLFVGIKFGQDGNNVSLSIAAR SENKVFVE IDCLSVRNGTQWI INFLKSADIAK 
Sbjct: 306 ELKSKLFVGIKFGQDGNNVSLSIAARTSENCTFVETIDCLSVRNGTQWIINFLKSADIAK 365 

Query: 361 VVVDGASGQELIAQEMREHGLKKPELPKVAEIITANTNIWSQGIMQETICHNDQPSLTAW 420 

W+DGASGQELLAQEM++ GLKKPELPKVAEI ITAN MWEQGIKQETICH+DQPSBTAW 
Sbjct: 366 WIDGASGQELIAQEMKDQGLKKPELPKVAEIITANMMVM3QGIMQETICHSDQPSLTAW 425 

Query. 421 TNCEKRQIGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQRTSC 471 

TNCEKRQIGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQRTSC 
Sbjct: 426 TNCEKRQIGSNGGFGYKSLYDDRDISLMDSALLAHWICYTTKPKRKQRTSC 476 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 449 

A DNA sequence (GBSx0486) was identified in S.agalactiae <SEQ ID 1445> which encodes the amino 
acid sequence <SEQ ID 1446>. Analysis of this protein sequence reveals the following: 



3 N- terminal signal i 



Final Results 

bacterial cytoplasm Certainty=0. 2745 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
or diagnostics. 



Example 450 

A DNA sequence (GBSx0487) was identified in S.agalactiae <SEQ ID 1447> which encodes the amino 
acid sequence <SEQ ID 1448>. Analysis of this protein sequence reveals the following: 

45 Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2568 (Affirmative) < suco 

50 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18703 GB:D38906 0RF28 [Bacteriophage rlt] 
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Identities = 124/250 (49%) , Positives = 164/250 (65%) , Gaps = 3/250 (1%) 

Query: 2 VDDVLPKLLKSVQQDFEKHFGKSEWAKAFM;LQAKKRTYKTVI<IEFAVEVGRLI.SLRI^ 61 

++D+LP LL+ + QDF++ S+ + ++ L+ KKATY NEF VEVG++LS L 
Sbjct: 1 MEDILPPLLEKINQDFDERAMSKKLKQSMELLKTKKATYIQANEFGVEVGQILSDVLGT 60 

Query: 62 SVISDELPDGKMYYKIJ^LVHDTLRHMYKLISDyAGDVQQNLNKQAKISLKIQRPPIJ!IQ 121 

V D LPDGKMY+NIA+RL+N L+ N+ LIS Y+ DVQ LN+ A LK Q P LNQ 
Sbjct: 61 EOTVDVLPDGKMYFNIADRLMSILKKNFDLISGYSTDVQSELNQIAGFKLKSQVPELNQ 120 

Query: 122 DKIDGLV1TOIASEPVFDDVKWLLDEPIVNFSQSIVDDCIRANADFHFKTGLKPTIERIST 181 

D+IDG+VNR++SE F+ + WLL EPIV FSQS+VDD ++ N DF K GLKP I R 
Sbjct: 121 DRIDGIVNRISSEDDFEKILWLLKEPIVTFSQSWDDTLKKNIDFQAKAGLKPKIVRKLV 180 

Query: 182 GKCCDWCDRLAGRYVYHEEPKDFYKRHQHCQCVIDYHPK--NGKRQNSWSKKWTKETTDI 239 

GK CDWC DAG Y Y P D Y RH+ C+C ++Y P+ + KRQ+ WSK W D 
Sbjct: 181 GKACDWCRNLAGSYDYPNVPSDVYHRHERCRCIYSYDPRDIDKKRQDWSKNWVDPDKDA 240 

Query: 240 -LERRKQMNI 248 

+ RK +N+ 
Sbjct: 241 KIAERKNLNL 250 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1449> which encodes the amino acid 
sequence <SEQ ID 1450>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3099 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < succ> 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 169/261 (64%), Positives = 207/261 (78%), Gaps = 2/261 (0%) 

Query: 1 MVDDvLPKLLKSVQQDFEKHFGKSEWAKAFAELQAKKATYKTVNEFAVEVGRLLSLAIA SO 

MVDD VLPKLLKSV+QDFEK+ FG+ S + W KAFAELQAKK TYKTVNEFA+EVGRLLSLAL 
Sbjct: 1 MTODvTjPKLLKSWQDFEKYFGESDvVTKAFAELQAKKOT 60 

Query: 61 NSVISDELPDGKMYYNIANRLVNDTLRHNYKIilSDYAGDVQQNmKQAKISLKIQRPPM 120 

SV SD+LPDGKMYYNIA RL+++T+ NYKLIS YAGDVQ+ LN+ A+I LK+QRPPLN 
Sbjct: 61 GSVSSDKLPDGKMYYNIAKRLLDETMGRNYKLISGYAGDVQRIIi^NAQIGLKVQRPPLN 120 

Query: 121 QDK1DGLVNRIASEPVFDDVKWLLDEPIVNFSQSIVDDCIRANADFHFKTGLKPTIERIS 180 

+DKI+G+VNRL SE FDDVKWL EPIVTJFSQSIVDD I+ANAD +KTG+ P + R 
Sbjct: 121 RDKINGMVNRLDSENTFDDVKWLFGEPIVNFSQSIVDDTIKANADLQYKTGMTPQVVRTE 180 

Query: 181 TGKCCDWCDRIAGRYVYHEEPKDFYKRHQHCQCVIDYHPKNGKRQNSWSKKWTK- -ETTD 238 

+G CC+WC + G Y Y + PKD ++RHQ C+C +DY PKNGK Q++WSK W K +T + 
Sbjct: 181 SGNCCEWCREWGTYSYPKVPKDVWRRHQRCRCTLDYDPKNGKVQSAWSKIWRKKEKTQE 240 

Query: 239 ILERRKQMNIDIRDNNRKSDI 259 

+ER ++ + K+DI 

Sbjct: 241 SIERVEKFKESALVESIKNDI 261 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines' or diagnostics. 
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Example 451 

A DNA sequence (GBSx0488) was identified in S.agalactiae <SEQ ID 1451> which encodes the amino 

acid sequence <SEQ ID 1452>. This protein is predicted to be Structural protein. Analysis of this protein 

sequence reveals the following: 

5 Possible site: 58 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 93 - 109 ( 93 - 110) 



Final Results 

bacterial membrane Certainty=0 . 1553 (Affirmative) < suco 

bacterial outside — - Certainty=0 . 0000 (Wot Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC39307 GB:AF022773 0RF5 [Lactococcus bacteriophage phi31] 
Identities = 271/410 (66%), Positives = 326/410 (79%), Gaps = 2/410 (0%) 

Query: 1 M^GMGYLQRKLALFKTGVDKRYRYYiMDDra 60 

M G+GYL+ KL++ K + RY YAM D + I +P + + YRS++ W AKGVD 
Sbjct: 1 MTEKGIGYLRFKLSVHKRRAE^YEQYAMKHVDRFKGITIPQALSQQYRSILGWCAKGVD 60 

Query: 61 SLADRIIFREFANDDFNAWEIFKANNPDIFFDTAIQSALIASCCFVYIMPGKEDSLPKMQ 120 

SLADR+IFREF NDDF EIF+ NNPDIFFD+A+ SALIASC F+Yx G+ D++ ++Q 
Sbjct: 61 SLADRLIFREFE1TODFTVNEIFEENNPDIFFDSAVLSALIASCSFIYISKGENDAV-RLQ 119 

Query: 121 VIEASKATGILDPTTFLLTEGYAVLESDSNEWPTLEAYFTGEKTWYYPKDEKP-YSIDNS 179 

VIEA ATGI+DP T LLTEGYAVLE D N N LEA+F ++T YY +D + SI N 
Sbjct: 120 VIFAVmTGIIDPITGLLTEGYAVLERDENNNVVnEMFLPDRTDYYYRDSRNNISIANP 179 



Sbjct: 180 TGHPJ.iVFl I H PORVI Pi ^^i<ITRSGMYKQSNAKRTLERAI3VTAKFYSFPQKYVTGLS 239 

Query: 240 PDAEPMEKWRATVSTLLEISKDEDGDKPTVGQFTTASMAPFMDHLKMYASLFAGGSGLTL 299 

DAEPME W+ATVS++L+ +KDEDGDKPT+GQFT SM+PF + L+ A+ FAG +GLTL 

Sbjct: 240 DDAEPMETWKATVSSMLQFTKDEDGDKPTLGQFTQPSMSPFTEQLRTAAAGFAGETGLTL 299 



Query: 300 DDLGFPSDNPSSVEAIKAAHENLRAAGRKAQRSFSSGFLNVAYIAVCLRDDFPYLRNQFM 359 
DDLGF SDNPSS VEAI KA+HENLR AGRKAQRS +G LNVAY+A CLRDD PYLR QF 
40 Sbjct: 300 DDLGWSDNPSSV^ftlKASHENLRLAGRKAQRSLGAGLLNVAYLAACLRDDVPYLREQFS 359 

Query: 360 DTEIKWEPLFEADANMLTLVGDGAIKLNQAIPGFMDADVIRDLTGVKGSD 409 

T+ KWEPLFEADA+ML+L+GDGAIKLNQAIP F++ D IRDLTG+KG++ 
Sbjct: 360 KTKPKWEPLFEADASMLSLIGDGAIKLNQAIPEFINKDTIRDLTGIKGAE 409 

45 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1453> which encodes the amino acid 
sequence <SEQ ID 1454>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood = -1.38 Transmembrane 93 - 109 ( 93 - 110) 



Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < s 

bacterial outside --- Certainty=0 .0000 (Not Clear) < sue 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < sue 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 395/422 (93%), Positives = 407/422 (95%) 
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Sbjct: 1 MNYMGMGYLRRKLALFKTGVDKRYR^ 60 

Query: 61 SLADRIIFREFAtTODFNAWEIFKANNPDIFFDTAIQSMiIASCCFVYIMPGKEDSLPKMQ 120 

SLADRIIFREF NDDFNAWE I FKANNPD I FFDTAI QSALIASCCFVYIMPG ED LPKMQ 
Sbjct: 61 SLADRI I FREFTNDDFNAWEI FKANNPD I FFDTAIQSALIASCCFVYIMPGAEDGLPKMQ 120 

Query: 121 V1EASKATGILDPTTFLLTEGYAVLESDSNENPTLEAYFTGEKTWYYPKDEKPYSIDNST 180 

VIEASKATGILDPTTFLLTEGYA+LESDSN NPTLEAYFT + WYYPK KPY+I N T 
Sbjct: 121 VIEASKATGILDPTTFLLTEGYAILESDSNGNPTLEAYFTDKDIWYYPKKGKPYNIKNPT 180 

Query: 181 GHPLLVPVIHRPDAVRPFGRSRITKAGMYHQKAAKRTLERAEVTAEFYSFPQKYVLGMDP 240 

GHPLLVP+IHRPDAVRPFGRSRITKAGMYHQKAAKRTLERAEVTAEFYSFPQKYVLGMDP 
Sbjct: 181 GHPLLVPIIHRPDAWPFGRSRITKAC^IYHQKAAKRTLSRAEVTAEFYSFPQKYVLGMDP 240 

Query: 241 I 
I 

Sbjct: 241 I 

Query: 301 DLGFPSDNPSSVFAIKAAHENLRAAGRKAQRSFSSGFLNVAYIAVCLRDDFPYLRNQFMD 360 
Sbjct: 

Query: 361 TEIKVffiPLFFJU3ANMLTLVGDGAIKlNQAIPGFMDADVIRDLTGVKGSDNPIPKATEVTT 420 

T IKWEPLFFJU2ANMLTLVGDGAIKLNQAIPGFMDADVIRDLTGVKG+D PIP TEVTT 
Sbjct: 361 TVIKWEPLFEADANMLTLVGDGAIKLKQAIPGFMDADVIRDLTGVKGADKPIPAITEVTT 420 

Query: 421 DG 422 
DG 

Sbjct: 421 DG 422 

SEQ ID 1452 (GBS364) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 6; MW 50kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 11; MW 75kDa). 

GBS364-GST was purified as shown in Figure 216, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 452 

A DNA sequence (GBSx0489) was identified in S.agalactiae <SEQ ID 1455> which encodes the amino 
acid sequence <SEQ ID 1456>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4063 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0, 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1457> which encodes the amino acid 
sequence <SEQ ID 1458>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

- Final Results 

bacterial cytoplasm --- Certainty=0. 4120 (Affirmative) < suco 
bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 101/118 (85%) , Positives = 110/118 (92%) 

Query: 1 MKKKCLICKKTFQAKTITOSLYCSEECRKKGIREKQEIKIiMKQKRADKKKEKIKVIiNWIADV SO 

+KKKCLICKK FQAKTNR+LYCSEECRKKG REKQRKLMKQKRA+++KEK KVLN N DV 
Sbjct: 1 LKKKCIilCKKNFQAKTNRTLYCSEECRKKGNREKQRKM 60 

Query: 61 TEKPKKIRNLVQHYKKLKREILDNESEFGFTGIALVEGIDIHEENFVDLVMQKIKEQQ 118 

TEKPKKIRNL QHYKKLK+EIL NESEFGFTGI L+EGID+HEENFVDLVMQKIKEQ+ 
Sbjct: 61 TEKPKKIFJ3IAQHYKKLKKEILANESEFGFTGITL1EGIDVHEENFVDLVMQKIKEQK 118 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 453 

A DNA sequence (GBSx0490) was identified in S.agalactiae <SEQ ID 1459> which encodes the amino 
acid sequence <SEQ ID 1460>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 0633 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC39305 GB:AF022773 0RF3 [Lactococcus bacteriophage phi31] 
Identities = 75/109 (68%), Positives = 87/109 (79%), Gaps = 1/109 (0%) 

30 

Query: 29 LRADKKGTHRVAFEKNKRRIjLKTAHLCGICGRPVDKSLKYPHPLSAAlDHIVPIAKGGHP 88 

LRAD+ G HRVAF+KN++ LLKT + CGICG+P+DK LK P PLS +DHI+PI KGGHP 
Sbjct: 3 LRADRTGAHRVAFDKNRKILLKTQNTCGICGKPIDKRLKAPDPLSPWDHIIPINKGGHP 62 

35 Query: 89 SSIDNLQLTHWQCNRQK3DKLFINQTAVRATWGNRNLPQSRDWSSYAS 137 

S++DNLQL HW CNRQKSDKLF N V+GNRNLPQSRDWSSY S 

Sbjct: 63 SAMDNLQLAHWTCNRQKSDKLF -NVKQEEPKVLGMRNLPQSRDWSSYVS 110 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1461> which encodes the amino acid 
40 sequence <SEQ ID 1462>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=o .4185 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=o. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

50 Identities = 88/112 (78%) , Positives = 102/112 (90%) 

Query: 28 KLRADKKGTHRVAFEKN^RIjLKTAHIiCGICGRPVDKSLKYPHPLSAAIDHIVPIAKGGH 87 

+LRADKKGTHRVAF++NK++LLK A +CGICG+PVDKSLKYPHPLSAAIDHIVPIAKGGH 
Sbjct: 3 QLRADKKGTHRVAFDR1WKK1LKAATVCGICGKPVDKSLKYPHPLSAAIDHIVPIAKGGH 62 

55 

Query: 88 PSSIDNLQLTHWQCmQKSDKLFINQTAVRATWGNRNLPQSRDWSSYASKE 139 

PS+++NLQLTHWQCNRQKSDKLF NQ + +GNRNLPQSRDWSS+A K+ 

Sbjct: 63 PSALENLQLTHWQCmQKSDKLFANQASNEPKTIGNRNLPQSRDWSSFAFKK 114 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 454 

A DNA sequence (GBSx0491) was identified in S.agalactiae <SEQ ID 1463> which encodes the amino 
acid sequence <SEQ ID 1464>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4481 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 455 

A DNA sequence (GBSx0492) was identified in S.agalactiae <SEQ ID 1465> which encodes the amino 
acid sequence <SEQ ID 1466>. Analysis of this protein sequence reveals the following: 

•j N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 2907 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

?GP:AAF43508 GB:AF145054 0RF15 [Streptococcus thermophilus 
bacteriophage 7201] 
Identities = 61/187 (32%) , Positives = 90/187 (47%) , Gaps = 31/187 (16%) 

Query: 1 MNIEEAKKLIDKQSIGKGGVGDIPVVKTH1VKVLLDQIDQPQPEVPRFVADWYEKHKDSL 60 

MN +FA K I K+ 4- + L D I +P VP++VADWYE+HKD 

Sbjct: 1 MNRDEAVKKIAKEGY ISIEHAEDLYDSIIT-KPWPQYVADWYEEHKDEF 49 

Query: 61 ECDL YLYHMSIY--DEEVEKDDFYYWMQTSKNPVYTLINMHQFGYTIQKEKLYT 112 

+L + H4+ Y +E DF W +KN + L+NMHQFGY +4-KEK YT 

Sbjct: 50 YI^HRVvRDFFEHmAYYFNENPIDYDFAamOTKNAlQILVNMHQFGYEVKKEKRYT 109 

Query: 113 VEIPN--PNERQLSFVLMRQLSGNVSIK'7MHRDNLDLLKTDNDLQLTESEIRKDFDWAWQ 170 

V I N E L++ R+ + RDN D +T + T E+ ++ + W 
Sbjct: 110 VRIRNLDDEETYLNYDKFRE TWVFYSRDNTDRFRTIH THKEL - EEGGFGWV 159 

Query: 171 FREEWE 177 

F E +E 
Sbjct: 160 FDCEGIE 166 

A related GBS nucleic acid sequence <SEQ ID 10927> which encodes amino acid sequence <SEQ ID 
10928> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1467> which encodes the amino acid 
sequence <SEQ ID 1468>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
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>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 38 15 (Affirmative) <: suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 70/180 (38%) , Positives = 98/180 (53%) , Gaps = 30/180 (16%) 

Query: 1 ^INIEEaKKLIDKQSI-GKGGVGDIPvvKTHIvKVIlLDQIDQPQPEVPEFVADWYEKHKDS 59 

MNIEEAK+L4D GK V+K V+ ++DQ++QP+PEVP+ VADW E+ K+ 

Sbjct: 1 MNIEE&KELVDNSKFYGKTS SVIKAE-VRDI IDQLNQPKPEVPQCVADWIEECKEE 55 

Query: 60 LECDLyLYHMSiyDEEVEKDDFYyMQTSKNPVyTLINMHQFGYTIQKEKLYTVEIPN-- 117 

DL L ++ + W+ S + GYT++KEKLYTV++PN 

Sbjct: 56 DLTL--KGLFSNSDMPAKIFDKIFGSI3ENCRLMAFAWINGYTVEKEKLYTVDLPNGQ 110 

Query: 118 PNERQLSFVLMRQLSGITVSIKVMHRDNLDLLKTDNDLQLTESEIRKDFDWAWQFREEVVE 177 

P R ++ + Q L T+N ++LTESEIRKDF+WAWQF EEV E 

Sbjct: 111 PLVRGINTLYFSQN LATEN-VKLTESEIRKDFEWAWQFAEEVTE 153 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f< 



Example 456 

A DNA sequence (GBSx0493) was identified in S.agalactiae <SEQ ID 1469> which encodes the amino 
acid sequence <SEQ ID 1470>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5365 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . C000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 457 

A DNA sequence (GBSx0494) was identified in S.agalactiae <SEQ ID 1471> which encodes the amino 
acid sequence <SEQ ID 1472>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.55 Transmembrane 34 - 50 ( 31 - 54) 

Final Results 

bacterial membrane Certainty=0. 4418 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ED 9657> which encodes amino acid sequence <SEQ ID 9658> 
was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1473> which encodes the amino acid 

sequence <SEQ ID 1474>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
5 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.25 Transmembrane 26 - 42 ( 20 - 49) 

Final Results 

bacterial membrane Certainty=0. 5501 (Affirmative) < suco 

10 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

15 Identities = 56/89 (62%) , Positives = 71/89 (78%) 

Query: 3 MTEQQMIDCLLYEIAKKDKIJJIHRlCNIITFLSIVimiSILWALQDHyKSQITELRTQL 67 

MTE+QMIDCLLYEL KKDK +++ II L+++L+ +S L V+L+ +Y+ QI LRTQL 
Sbjct: 1 >rTEEQMIDCLLyELVKKDKAIKKICSIIIAALTvMLIVVSGLCVSLKSYYEPQIYGLRTQL 60 

20 

Query: 68 SRTQKQLKRASDDRARQTKRIAELTGNGG 96 

SRTQKQLKRAS+ RQTKRIA+LT NGG 
Sbjct: 61 SRTQKQLKRASEQNQRQTKRIADLTNNGG 89 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 458 

A DNA sequence (GBSx0495) was identified in S.agalactiae <SEQ ID 1475> which encodes the amino 
acid sequence <SEQ ID 1476>. Analysis of this protein sequence reveals the following: 

30 Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2040 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 459 

A DNA sequence (GBSx0496) was identified in S.agalactiae <SEQ ID 1477> which encodes the amino 
acid sequence <SEQ ID 1478>. Analysis of this protein sequence reveals the following: 

45 possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3044 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD37108 GB:AF109874 unknown [Bacteriophage Tuc2009] 
Identities = 50/143 (34%) , Positives = 67/143 (45%) , Gaps = 29/143 (20%) 

Query: 1 MIPNFE?AFNKETKKM-YG-VDGFELSVRKIYRCSLRDDEFRCGRLETFHFVEDNFDDYIL 58 

MIP RA++K+ ++M YG V+ F+ S+ YR HF +D 

Sbjct: 1 MIPKLRAWDKQDERMSYGEVEYFDDSIN- -YRFD HFCTGADEDVEF 44 f 

Query: 59 MQSTGMFDKNGVEIFDGDIVLTTRL IDY-TYKNFKGWKMLEGRWLIDTGKDA 110 

MQSTG+ DKNGVEI++GDI + + IYY G+EGL + 
Sbjct: 45 MQSTGIKDKNGVEI YEGDI LKLHAI FLAPDDKIGYLEYS PKYGYS 1 1 CEGNRLY RQE 101 

Query: 111 VGLWTEVDENEAIGNIYQNSELL 133 

T E IGNIY+N ELL 

Sbjct: 102 YWASTNKLNYEVIGNIYENPELL 124 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1479> which encodes the amino acid 
sequence <SEQ ID 1480>. Analysis of this protein sequence reveals the following: 

: N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 
Identities = 44/52 (84%) , Positives = 47/52 (89%) 

Query: 1 MIPNFRAFNKETKKMYGVDGFELSVRKIYRCSLADDEFRCGRLETFHFVEDN 52 

MIPNFR FNK+TKKMY +DGF+ S RKIYRCSLADDEFR GRLETFHFVEDN 
Sbjct: 1 MI PNFRGFNKKTKKMYS I DGFKS SERKI YRCSLADDEFRSGRLETFHFVEDN 52 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 460 

A DNA sequence (GBSx0497) was identified in S.agalactiae <SEQ ID 1481> which encodes the amino 
acid sequence <SEQ ID 1482>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0. 3843 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9655> which encodes amino acid sequence <SEQ ID 9656> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 461 

A DNA sequence (GBSx0498) was identified in S.agalactiae <SEQ ID 1483> which encodes the amino 
acid sequence <SEQ ID 1484>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 5189 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9653> which encodes amino acid sequence <SEQ ID 9654> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

iGP:AAF435Q3 GB:AF145054 ORF10 [Streptococcus thermophilus 
bacteriophage 7201] 
Identities = 92/147 (62%), Positives = 121/147 (81%) 

Query: 15 IEPKPQTRPKFSKFGTYEDPKMKRWRKEVSGWIEKNYDGPFFDDCIKVEVTFYMKAPKTL 74 

IEPKPQTRP+FSKFGTYEDPKMK WR+E S IE+ YDG FF I V+VTFYMKAP ++ 
Sbjct; 7 IEPKPQTRPRFSKFGTYEDPKMKAWRRECSRLIEQEYDGQFFYGPISVDVTFYMKAPLSV 66 

Query: 75 SKEPTQRSKGKTIQIYQKFVRELIWHAKKPDIDNLIKAVFDSISDAGYDRIQKSGIvWSD 134 

SK+PT +++ KT ++ F+ E +WH++KPDIDNLIK7A+FDSIS AGY+++ K GIVW+D 
Sbjct: 67 SKKPTPKAFAKTWDAFKKEMJiEFJjVfflSRKPDIDNLIK7ALFDSISTAGTOKVDKKGIVWTD 126 

Query: 135 DNIVCDLRAKKKYSQNPRIKVRIEEID 161 

D+IVC L A+K+YS+NPRI+ I+E++ 
Sbjct: 127 DSIVCKLSAQKRYSENPRIEFEIKELE 153 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 462 

A DNA sequence (GBSx0499) was identified in S.agalactiae <SEQ ID 1485> which encodes the amino 
acid sequence <SEQ ID 1486>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .4007 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 463 

A DNA sequence (GBSx0500) was identified in S.agalactiae <SEQ ID 1487> which encodes the amino 
acid sequence <SEQ ID 1488>. This protein is predicted to be pXOl-07. Analysis of this protein sequence 
reveals the following: 

no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3664 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC38715 GB:AF030367 maturase-related protein [Streptococcus pneumoniae] 
Identities = 146/373 (39%), Positives = 216/373 (57%), Gaps = 18/373 (4%) 

Query: 35 LYDKVYRKDILKVAWFYVKRNKGSAGIDDFTIEKIEAYGVQKFLDEIEDQLRNKICTQPKA 94 

L DK+ ++ + A+ VK NKGSAGID TIEE++ Y Q + ++ ++ +KY+P+ 
Sbjct: 4 LLDKILSRENMLEAYNQVKSNKGSAGIDGMTIEEMDNYLRQNWR-LTKELIKQRKYKPQP 62 

Query: 95 VKRVYIPKANGKKRPLGI PTVRDRWQTA VKI VI EP I FEADFQEFSYGFRPKRSANQAIR 154 

V +V IPK +G R LGIPTV DR++Q A+ V+ PI E F + SYGFRP RS +AI 
Sbjct: 63 VLKVEIPKPDGGIRQLGIPTVMDRMIQCAIVQVMSPICEPHFSDTSYGFRPNRSCEKAIM 122 

Query: 155 EIYKYUTYGCEWVIDADLKGYFDTIPHDKLLLLVKERVTDKSIIKLLSLWLEAGIMEDNQ 214 

++ +YLN G EW++D DL+ +FDT+P D+L+ LV + D L+ +L +G++ + Q 

Sbjct: 123 KLLEYIM3GYEWIVDIDLEKFFDTVPQDRLMSLVHNIIEDGDTESLIRKYLHSGVIINGQ 182 

Query: 215 TOSNILGTPQGGVISPLLANIYLNAIjDRYWKNNRIiEGRGHDAHLIRYADDFVI - LCSNNP 273 

++GTPQGG +SPLL+NI LN LD+ LE RG +RYADD VI + S 

Sbjct: 183 RYKTLVGTPQGGNLSPI.LSNIMLNELDK ELEKRG- -LRFVRYADDCV1TVGSEAA 235 



Query: 332 KSMKSIKGKVKDVIQTGQHLNLPDVMERLNPMLRGWANYFItAGNSKQHFKSIDNYVIYNL 391 

S++ K K4K + Q ++L +E+WJ +RGW NYF GN K SID + L 
Sbjct: 289 DSVRRFKLKLKKLTQRKWSIDLTRRIEQIiNLSIRGWINYFSLGNMKSIVASIDERLRTRL 348 

Query: 392 TIMLRKKHKKSGK 404 

+++ K+ KK + 
Sbjct: 349 RMI IWKQWKKKSR 361 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 



Example 464 

A DNA sequence (GBSx0501) was identified in S.agalactiae <SEQ ID 1489> which encodes the amino 
acid sequence <SEQ ID 1490>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N-terminal signal sequence 



• Final Results 

bacterial cytoplasm — Certainty=0. 3 833 (Affirmative) < 

bacterial membrane Certainty=0. 0000 (Not Clear) < ! 

bacterial outside — - Certainty=0. 0000 (Not Clear) < s 
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A related GBS nucleic acid sequence <SEQ ID 965 1> which encodes amino acid sequence <SEQ ID 9652> 
was also identified. 

A further related DNA sequence (GBSx2517) was identified in S.agalactiae <SEQ ID 7217> which 
encodes the amino acid sequence <SEQ ID 721 8>. Analysis of this protein sequence reveals the following: 

5 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3833 (Affirmative) < succ^ 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1491> which encodes the amino acid 
sequence <SEQ ID 1492>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Pinal Results 

bacterial cytoplasm Certainty=0. 2299 (Affirmative) < succ> 

bacterial membrane — - Certainty=0. 0000 (Wot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 113/163 (69%) , Positives = 128/163 (78%) , Gaps = 25/163 (15%) 

Query: 1 MINNrVlVGRI4TKDAELRyTPSNQAVATFSLAVJSnWFKMQSGERE^FINCTIWRQQAEN 60 

MINN+VLVGRMTKDAELRYTPS AVATF+LAVNR FK+Q+GEREADFINCVIWRQ AEN 
Sbjct: 1 MINNWLVGRMTKDAELRYTPSQVAVATFTIAVNRTFKSQHGEREADFINCVIWRQPAEN 60 

Query: 61 LANWAKKGALVGITGRIQTRNYENQQGQRIYVTEWAENFQLLESRNSQQ-- Q 111 

LANWAKKGAL+G+TGRIQTRNYENQQGQR+YVTEWA+NFQ+LESR +++ 
Sbjct: 61 LANWAKKGALIGVTGRIQTRNYENQQGQRVYVTEVVADNFQMLESRATREGGSTGSFNGG 120 



Query: 112 TNQSGNSSNSY- 
N + +SSNSY 

Sbjct: 121 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 465 

A DNA sequence (GBSx0502) was identified in S.agalactiae <SEQ ID 1493> which encodes the amino 
acid sequence <SEQ ID 1494>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.33 Transmembrane 17 - 33 ( 17 - 33) 

Final Results 

bacterial membrane Certainty=0 . 1532 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 466 

A DNA sequence (GBSx0503) was identified in S.agalactiae <SEQ ID 1495> which encodes the amino 
5 acid sequence <SEQ ID 1496>. This protein is predicted to be p22 erf-like protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2469 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA97824 GB:AB044554 orf 17 [Staphylococcus aureus prophage 
phiPV83] 

Identities = 93/183 (50%), Positives = 120/183 (64%), Gaps = 5/183 (2%) 



Query: 


1 


MRKSESITEYAKAFCKAQLEVKQPLKDKDNPFFKSKYVPLKNVTEArTTAPANNGISFSQ 


60 






M KSE++ E IS + EVKQPLKDK+NPFFKSKYVPLENV EAI A +G+S++Q 




Sbjct: 


1 


MNKSETVVEINKAMVAFRKEVKQPLKDKNNPFFKSKYVPLENWFAIDEAATPHGLSYTQ 


60 




61 


DPTTNTENGYIDVATLVNHTSGEWVEYGPLSVKPTKNDVQGAGSAITYAKRYALSAIFGI 


120 






N +G + VAT++MH SGE++EY P+ + KN QGAGS I+Y KRY+LSAIPGI 




Sbjct: 


61 


W-ALNDVDGRVGVATMLMHESGEYIEYDPVFMNAEKNTPQGAGSLISYLKRYSLSAIFGI 


119 




121 


TSDQDDDGNEDS KPNNSRQS PKATTKKTQKTGYQTPKISNIQI ETYKSDLND I AKATNQN 








TSDQDDDGNE S NN +PK T +TQ +T I ++ ++ + K ON 




Sbjct: 


120 


TSDQDDDGNEASGKNN- - -NPKQQT-RTQWASSETIGILRKEVISFTKLIKGTDKFAPQN 


175 




181 


VEE 183 








+ E 




Sbjct: 


176 


IVE 178 





35 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 467 

40 A DNA sequence (GBSx0504) was identified in S.agalactiae <SEQ ID 1497> which encodes the amino 
acid sequence <SEQ ID 1498>. This protein is predicted to be gpl57. Analysis of this protein sequence 
reveals the following: 
Possible site: 55 

>=■> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm --- Certainty=0. 3148 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

50 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD44102 GB:AF115103 orf 157 gp [Streptococcus thermophilus 
bacteriophage Sfi21] 



WO 02/34771 



-563- 



PCT/GB01/04789 



Identities = 59/160 (36%), Positives = 100/160 (61%), Gaps = 3/160 (1%) 

Query: 1 MAYLYELEGIYAQLQSMDLDEETFQDTLDSIDFQSDLEMNIEYFVKMLKNVQADAEKyKA 60 
MA LYEL G + ++ +M++D+ET DTL++ID+ SD EN +E +VK++K+++AD E K 
5 Sbjct: 1 ^TLYELTGQFLEIYNMEIDDETKLDTLEAIDWTSDYENKVEGWKVIKSLEADIEARKN 60 

Query: 61 EKFAFYKKQKQAFAKaEK^KETIHIVWSLSQKKKOTAGMFKVSLRRSKKVEILDETKIPL 120 
EK+ K ++K +K K + ++M + + +VD +FK+ +SK V +++E K+P 

^ Sb D ct: 61 EKKRLDGLNKSDQSKIDKLKAAIAISMTT3TG 119 

Query: 121 DYMQEKIEYKPMKAEISKALKSGIDISGVEI.IETESLQVK 160 

+Y + YKP K + + LKSG I G L E +L ++ 
Sbjct: 120 EY--QIATYKPDKKTLKELLKSGKHIEGATLEEEKNMIIR 157 

15 No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 468 

A DNA sequence (GBSx0505) was identified in S.agalactiae <SEQ ID 1499> which encodes the amino 
20 acid sequence <SEQ ID 1500>. This protein is predicted to be tropomyosin 2. Analysis of this protein 
sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4474 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 469 

A DNA sequence (GBSx0506) was identified in S.agalactiae <SEQ ID 1501> which encodes the amino 
acid sequence <SEQ ID 1502>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4114 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9649> which encodes amino acid sequence <SEQ ID 9650> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 470 

A DNA sequence (GBSx0507) was identified in S.agalactiae <SEQ ID 1503> which encodes the amino 
acid sequence <SEQ ID 1504>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3799 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1505> which encodes the amino acid 
sequence <SEQ ID 1506>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3775 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 43/46 (93%) , Positives = 46/46 (99%) 

Query: 1 MTKQHRETLIWYRASHQEREKLLDFGLVDKSQYVTLLRQLRKKYAI 46 

MTKQHRETLIWYRASHQERE+LLDFGLVDK++YVTLLRQLRKKYAI 
Sbjct: 1 MTKQHRETLIWYRASHQERERDLDFGLVDKaRYVTLLRQLRKKyAI 46 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 471 

A DNA sequence (GBSx0508) was identified in S.agalactiae <SEQ ID 1507> which encodes the amino 
acid sequence <SEQ ID 1508>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4308 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1509> which encodes the amino acid 
sequence <SEQ ID 1510>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Certainty=0. 4308 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 76/77 (98%) , Positives = 76/77 (98%) 

Query: 1 MDQEIFNFFNKQIKKDFGKTASKETFAKFASYCAEGIEKNGVKPIFNWINLYAFGTGMTT 60 

MDQEI FNFFNKQI KKDFGKTAS KETFAKFAS YCAEG I EKNG VKPI FNWINLYAFGTGMTT 
Sbjct: 1 MDQEIFNFFNKQIKKDFGKTASKETFAKFASYCAEGIEKNGVKPIFNWINLVAFGTGMTT 60 

Query: 61 AEADRLRIERYKQENTL 77 

AEADRLRIERYKQEN L 
Sbjct: 61 AEADRLRIERYKQENAL 77 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 472 

A DNA sequence (GBSx0509) was identified in S.agalactiae <SEQ ID 151 1> which encodes the amino 
acid sequence <SEQ ID 1512>. Analysis of this protein sequence reveals the following: 

-terminal signal sequence 



• Final Results 

bacterial cytoplasm — Certainty=0 . 2706 (Affirmative! 

bacterial membrane — Certainty=0 . 0000 (Not Clear) . 

bacterial outside Certainty=0. 0000 (Not Clear) • 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1513> which encodes the amino acid 
sequence <SEQ ID 1514>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 3316 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 52/127 (40%), Positives = 75/127 (58%), Gaps = 1/127 (0%) 

Query: 160 EDRFTOVVEANLGRGLVKFEFDMINDYLIGQWSKI3LFLEAVKVAVANNTOKFNYIARIL 219 

E + + + GR + FE + I ++ N+ ++ A++ AV NN + YI +IL 
Sbjct: 3 EKKLFENFQLTFGRMISPFEIEDIQKWIHEDN^IEVVNLALREAVENNKISWKYINKIL 62 

Query: 220 DNWINDGIKTPEQAYQAQRDFKAKKANKTMQSQSNVPSWSNPDYKGPDLKEFALGSIDDI 279 

+W G T E+ + F K +++ + SNVPSWSNPDYK PDL+EFALGS+D I 

Sbjct: 63 VDWYKSGDTTVEKVRDRLQRFDDSKKQRSVTT-SNVPSWSNPDYKEPDLEEFALGSMDGI 121 

Query: 280 EDGSGDF 286 
I 

Sbjct: 122 E 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 473 

A DNA sequence (GBSx0510) was identified in S.agalactiae <SEQ ID 1515> which encodes the amino 
acid sequence <SEQ ID 1516>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
5 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.63 Transmembrane 13 - 29 ( 11 - 31) 

Final Results 

bacterial membrane — Certainty=0 . 3251 (Affirmative) < suco 
10 bacterial outside --- Certainty=0 . 0000 {Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9647> which encodes amino acid sequence <SEQ ID 9648> 
was also identified. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 474 

20 A DNA sequence (GBSx0511) was identified in S.agalactiae <SEQ ID 1517> which encodes the amino 
acid sequence <SEQ ID 151 8>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm --- Certainty=0 . 5822 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 475 

35 A DNA sequence (GBSx0512) was identified in S.agalactiae <SEQ ID 1519> which encodes the amino 
acid sequence <SEQ ID 1520>. Analysis of this protein sequence reveals the following: 
Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm --- Certainty=0 . 4175 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 476 

A DNA sequence (GBSx0513) was identified in S.agalactiae <SEQ ID 1521> which encodes the amino 
acid sequence <SEQ ID 1522>. This protein is predicted to be Pl-antirepressor homolog. Analysis of this 
protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3411 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9645> which encodes amino acid sequence <SEQ ID 9646> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 100 MLQRNEKSKQTOKYFIQVEKDFNSPEKIRmALL^KKITlILTMENNQLQLDLKEAQKQ 159 

M+ + K K++R+YFIQVEK++NSPE 1+ RAL +++ +1 L +N L L L+E+ K+ 
Sbjct: 1 MMSKTAKGKEIRQYFIQVEKNWNSPEMIIQRALEISNARIQELQAQNKSLTLQLEESNKK 60 

Query: 160 ARyLDLIIESKGALROTQIAADYGMSVNKFNKTLLEFGVQHKVNGQWILYKRHMGKGYTD 219 

A YLD+I+ + h TQIAADYG S FN+ L E G+QHKVNGQWILYK +MGKGY 
Sbjct: 61 ASYLDIILGTPDLLATTQIAADYGYSARTFNQLLKEVGIQHKVNGQWILYKAYMGKGYVQ 120 

Query: 220 SHTFDYQDKNGHTRANVTTTWTQKGRLFLYELLKDNKILPLIEQEDI 266 

S +F ++D+ GH R+ +T WTQKGR +Y++LK+N LPLIE++DI 
Sbjct: 121 SKSFAFKDRKGHDRSKPSTYWTQKGRKLIYDVLKENGTLPLIERDDI 167 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1523> which encodes the amino acid 
sequence <SEQ ID 1524>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4214 (Affirmative) >; suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 130/249 (52%) , Positives = 163/249 (65%) , Gaps = 14/249 (5%) 

Query: 19 MNQLINITLNENQEPWSGRDLHNVnNIKTQYTKWLERMSEYGFEEI^TOYIAISQKRLTA 78 

MNQLIN+TLNENQEPWSGRDLH VL IKTQYTKWLERMSEYC-F EN D++AISQKRLTA 
Sbjct: 1 MQLIimiiNENQEPWSGRDLHKVIiEIKTQYrKWLERMSEYGFVENEDFMAISQKRLTA 60 

Query: 79 QGNRTEYIDHVLKlDMAICEIAMLQRNEKSKQTOKYFIQVEKDFNSPEKII'mRALLMADKK 138 

QGN+TEY DHA7LKLDMAKEIAMLQRNEKSK+TOKYFIQVEKDFNSPEKIMARALLMADKK 
Sbjct: 61 QGNQTEYTDHVLKIJDMAKEIAMLQRNEKSKEVRKYFIQvEKI)FNSPEKI^mRALL^^]^ 120 

Query: 139 ITNLTMENNQLQLDLKEAQKQARYLDLIIESKGALRVTQIAA DYGMSVNKFNKTL 193 

+ ++ + + + D + S ++ V ++A + + L 

Sbjct: 121 V HKLEAQIEADRPKVLFADAVSASHTSILVGELAKLLKQNGVNIGATRLFTWL 173 
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Query: 194 LEFGVQHKA^GQ-WIL-YKRHMGKGYTDSHTFDYQDKNGHTRAWTTTWTQKGRLFLYfit 351 

+ G K NG+ W + ++ + G +GH + T T KG+ + 

Sbjct: 174 RKHGYLIKRNGRDWNMPTQKSVELGLIRVKETSITHSDGHITVSKTPLVTGKGQQYFrNK 233 

5 Query: 252 LKDNNILPIi 260 

+ LP+ 
Sbjct: 234 FLNQEYLPV 242 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 477 

A DNA sequence (GBSx0514) was identified in S.agalactiae <SEQ ID 1525> which encodes the amino 
acid sequence <SEQ ID 1526>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
15 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4205 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1527> which encodes the amino acid 

sequence <SEQ ID 1528>. Analysis of this protein sequence reveals the following: 

25 Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 21/63 (33%) , Positives = 31/63 (48%) , Gaps = 1/63 (1%) 

35 

Query: 1 MQQFNLKQLREKKGFTQNELADKANVSRSLWGLETGSYSETSTASLKKLAKALDVKIKD 60 

M+ LK R K +Q LAD VSR + +E G Y+ T + + + LD + D 
Sbjct: 1 MKNLKLKAARAGKDLSQQAIiADLVGVSRQTIAAVEKGDYNPTlNLCI-AICRVLDKTLDD 59 

40 Query: 61 LFF 63 

LF+ 

Sbjct: 60 LFW 62 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 478 

A DNA sequence (GBSx0515) was identified in S.agalactiae <SEQ ID 1529> which encodes the amino 
acid sequence <SEQ ID 1530>. Analysis of this protein sequence reveals the following: 

possible site: 26 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0396 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA17582 GB:D90907 hypothetical protein [Synechocystis sp.] 
Identities = 45/164 (27%) , Positives = 79/154 (47%) , Gaps = 33/164 (20%) 



162 TyYTTIIGQDKPLEHIASHVF\'DMLEQKDIAIQS£SLTNLEI 



Sbj. 
Sbj. 

Query: 220 YYNILNNSFITKKNSELKEQNKRVLTNLGMITLTLFGVRFSKTC 263 

+1 ++ N K ++LTLFG+ F + C 

Sbjct: 199 RISIW YMQDGNRSFKAH VSLTLFGIHFMRVC 229 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1531> which encodes the amino acid 
sequence <SEQ ID 1532>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 0151 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 64/215 (29%) , Positives = 105/215 (48%) , Gaps = 23/215 (10%) 

Query: 65 , QKIAKEIQDWSKNIE-NLQEPSLSIAGPALiaSKFYLEEEEIjRNLFTKLIASSMDKSKN 123 

+K EI SK + +L+EP I PA+ S+ YL E LRN+F + IAS+ ++ K 
Sbjct: 72 EKFKNEIDCEFSKIPQTSLKEPVEYILYPAINESEQYLSNETLRNMFARTIASTFNQDKE 131 

Query: 124 EFNHPSFIEI I KQFDKIDAQNFKI ISDLYFKKGFVATGTYYTTI IGQDKPLEHI 177 

+ H +F++IIKQ +DAQN +1+ IG E++ 

Sbjct: 132 KDLHSAFVQI IKQMTPLDAQNLLLINQ EGNNLIANLQIGVHYSKENLSGTVNK 184 

Query: 178 ASHVFVDNLEQNDIAIQSSSLTNDERLGLIQINYKAHTOEKEYYNILNNSFITKKNSELK 237 

A+++++ L+ + I +SS+ NL RLGLI+++Y + + Y +1 + SE+ 
Sbjct: 185 ANNIYLSKLDYSPDII-ASSIDNLTRIiGLIKVDYLHYPLDSNYESIKQTTIYKSLESEIN 243 

Query: 238 EQNKRVTiTNL GMITLTLFGVRFSKTCL 264 

N +N G ++LT FG +F CL 

Sbjct: 244 TUJLFKTSNTKYDIKIEKGKVSLTDFGKKFISVCL 278 

SEQ ID 1530 (GBS261) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 8; MW 31kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 479 

A DNA sequence (GBSx0516) was identified in S.agalactiae <SEQ ID 1533> which encodes the amino 
acid sequence <SEQ ID 1534>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have an uncleavable N-terra signal seq 

INTEGRAL Likelihood = -8.55 Transmembrane 3 - 19 ( l - 26) 
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Pinal Results 

bacterial membrane — - Certainty=0 .4418 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 480 

A DNA sequence (GBSx0517) was identified in S.agalactiae <SEQ ID 1535> which encodes the amino 
acid sequence <SEQ ID 1536>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 35 - 51 ( 30 - 51) 



- Certainty=0. 2996 (Affirmative) ■ 



bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1537> which encodes the amino acid 
sequence <SEQ ID 1538>. Analysis of this protein sequence reveals the following: 
Possible site: 47 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood - -4.94 Transmembrane 31 - 47 ( 30 - 51) 

Final Results 

bacterial membrane --- Certainty=0 . 2975 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . D00D (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 45/52 (86%) , Positives = 48/52 (91%) 

Query: 1 MNWKKLMLGDLEHTFTSRDGKEKTSVEFEGGVLPALLVLGGITWLIAWLITK 52 

MNWKKLM GDLEHTFT+ DGKEKTS+EFEGGVLPALLVLGGI W+IAW ITK 
Sbjct: 1 MIJWIQCLMFGDLEHTFTNHDGKEKTSIEFEGGVIjPALLVIjGGIAWMIAWFITK 52 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 481 

A DNA sequence (GBSx0518) was identified in S.agalactiae <SEQ ID 1539> which encodes the amino 
acid sequence <SEQ ID 1540>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3445 (Affirmative) < suco 
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bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 482 

A DNA sequence (GBSx0519) was identified in S.agalactiae <SEQ ID 1541> which encodes the amino 
10 acid sequence <SEQ ID 1542>. Analysis of this protein sequence reveals the following: 
Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm --- Certainty=0. 3934 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 483 

A DNA sequence (GBSx0520) was identified in S.agalactiae <SEQ ID 1543> which encodes the amino 
25 acid sequence <SEQ ID 1544>. This protein is predicted to be repressor protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0. 0905 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 9643> which encodes amino acid sequence <SEQ ID 9644> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1545> which encodes the amino acid 
sequence <SEQ ID 1546>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3117 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
45 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 175/264 (66%) , Positives = 207/264 (78%) , Gaps = 19/264 (7%) 
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Query: 34 LGK^IKKYRDTi™LSMREFAKEaGISKAY--VSILEKISIRDPRNGKEIIPSIPIIKKVSDT 91 

LG I+K R+ N++ E ++ G+ K Y VS EKN + GK++ KK+++ 
Sbjct: 24 LGDRIRKLREGRNMTQTELSEILGM-KTYTTVSKWEKNENFPKGKDIi KKLAEI 75 

Query: 92 IGISFDDLLNSLDENQIVALWETiCTEKWLTSSTLQKITSTSSQLEQPRQEKVLSFANEQL 151 

++ D LL L ++K K + +1 S +QLEQPRQEKVL+FANEQD 

Sbjct: 76 FNVTSDYLLG LTDSKLGKITIQNEQPEIVSIYMQBEQPRQEKVLNFANEQL 126 

Query: 152 EEQNKWSMFDRKVEETFJTYITDYVEGLVAAGLGAYQEDNLHMEVKLRADDVPDKYDTIA 211 

EEQNK VS+FD+K EETE+YITDYVEGLVAAGLGAYQEDNLHM+VKLR+DDVPD+YDTIA 
Sbjct: 127 EEQNKTVSIFDKKSEETEDYITDYVEGLVAAGLGAYQEDNLHMKVKLRSDDVPDEYDTIA 186 

Query: 212 KVAGNSMEPLIQDNDIjLWKVSSQVDMTOIGIFQVTIGKJIFVTCKLKRDYDGAWYLQSLNKS 271 

KVAG+SMEPLlQDlroLLF+KVSSQVD^lNDIGIFQWGKNFVKKLKRDYDGAWYLQSLNKS 
Sbjct: 187 K^AGDSMEPLIQDIffiLLFIKVSSQVD^I^roIGIFQV>IGKNFVKKLKRDYDGAWYLQSIJ^IKS 246 

Query: 272 YEEIYLSENDNIRTIGEWDIYRE 295 

YEEIYLS4+D+IRTIGEWDIYRE 
Sbjct: 247 YEEIYLSKDDDIRTIGEWDIYRE 270 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 484 

A DNA sequence (GBSx0521) was identified in S.agalactiae <SEQ ID 1547> which 
acid sequence <SEQ ID 1548>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3760 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=G . 0G0O (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 485 

A DNA sequence (GBSx0522) was identified in S.agalactiae <SEQ ID 1549> which encodes the amino 
acid sequence <SEQ ID 1550>. This protein is predicted to be integrase (ripX). Analysis of this protein 
sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2719 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB96616 GB:AJ400629 integrase [Streptococcus pneumoniae 
bacteriophage MM1] 
Identities = 36/59 (61%), Positives = 48/59 (81%), Gaps = 1/59 (1%) 
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Query: 2 KIYGDYHTHLFRHSHISFLAEKGIPLHaiMDRVGHSDPKTTLSIYSHTTVNMKE-IINK 59 

KI + +H+FRHSHISFLAE G+P+ +IMDRVGHS+ K TL IYSHTT +M++ ++NK 
Sbjct: 312 KIEKNLSSHIFRHSHISFLAEEGLPIK3IMDRVGHSNAKMTLEIYSHTTEDMEDKLVNK 370 

5 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1551> which encodes the amino acid 
sequence <SEQ ID 1552>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm — Certainty=0 . 2719 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

15 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 63/71 (88%) , Positives = 66/71 (92%) 

Query: 1 MKIYGDYHTHLFRHSHISFnAEKGIPrJJAIMDRVGHSDPKTTLSIYSHTrVNMKEriNKQ 60 
20 +KIYGDYHTHLFRHSHISFLAEKGIPUSIAlMDRVGHSDPKTTLSIYSHTTvNMKEIINKQ 

Sbjct: 1 LKIYGDYHTHLFRHSHISFLAEKGIPLNAIMDRVGHSDPKTTLSIYSHTTVNMKEIINKQ 6 0 

Query: 61 TAPFVPLLKSE 71 
T PF +K + 
25 Sbjct: 61 TDPFKTGIKQK 71 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 486 

30 A DNA sequence (GBSx0523) was identified in S.agalactiae <SEQ ID 1553> which encodes the amino 
acid sequence <SEQ ID 1554>. This protein is predicted to be 50S ribosomal protein L19 (rplS). Analysis 
of this protein sequence reveals the following: 
Possible site: 54 

>» Seems to have no N-terminal signal sequence 

35 

Final Results 

bacterial cytoplasm — - Certainty^O. 3331 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

40 

A related GBS nucleic acid sequence <SEQ ID 9641> which encodes amino acid sequence <SEQ ID 9642> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC01534 GB:U88973 ribosomal protein L19 [Streptococcus thermophilus] 
45 Identities = 110/115 (95%) , Positives = 112/115 (96%) 

Query: 25 MNPLIQSLTEGQLRSDIPEF^GDTVRVHAKVVEGTRERIQIFEGWISRKGQGISEMYT 84 

MNPLIQSLTEGQLR+DIP FR GDTVRVHAKVVEGTRERIQIFEGVVISRKGQGISEMYT 
Sbjct: 1 MNPLIQSLTEGQLRTDIPSFRPGDTVRVHAKVVEGTRERIQIFEGWISRKGQGISE^ 60 

50 

Query: 85 TOKISGGIGVERTFPIHTPRVDKIEVVRYGIO/RRAKLYYLRALQGKAARIKEIRR 139 

VRKIS GIGVERTFPIHTPRVDKIEVVRYGKVRRAKLYYIjRAIiCJGKAARIKEIR+ 
Sbjct: 61 VRKISSGIGTORTFPIHTPRVDKIEVVRYGKVRRAKLYVIJJALQGKAARIKEIRK 115 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1555> which encodes the amino acid 
sequence <SEQ ID 1556>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4849 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/115 (96%) , Positives = 113/115 (97%) 

Query: 25 MNPLIQSLTEGQLRSDIPEFRAGDTVRVHAKVVEGTRERIQIFEGWISRKGQGISEMYT 84 

MNPLIQSLTEGQLRSDIP FR GDTVRVHAKWEGTRERIQI FEGWI SRKGQGISEMYT 
Sbjct: 1 ^PIjIQSLTEGQLRSDIPNFRPGDTTOTOMCVVEGTRERIQIFEGWISRKGC<3ISEMYT 60 

Query: 85 WKISGGIGVERTFPIHTPRVDKIEWRYGKVRRMCLYYLRALQGKAAR1KEIRR 139 

WKISGGIGVERTFPIHTPRvDKIEV+R+GKVRRAKLYYLRALQGKRARlKEIRR 
Sbjct: 61 TOKISGGIGVERTFPIHTPRVXIKIEVIRHGKVRRAKI,YYr,RALQGKAARlKEIRR 115 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 487 

A DNA sequence (GBSx0524) was identified in S.agalactiae <SEQ ID 1557> which encodes the amino 
acid sequence <SEQ ID 1558>. This protein is predicted to be ISL2 protein. Analysis of this protein 
sequence reveals the following: 

\ uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MKAQAIVTSQGRIVSLDIAvNYCHDMKLFKtlSRRNIGQAAKILADSGYQGIMKMYSQAQT 60 

MK QAIVTSQGRIVSLDI VNYCHDMKLFKMSRRNIGQA KILADSGYQG+MK+Y QAQT 
Sbjct: 1 MKTQAIVTSQGRIVSLDITVNYCHDMKLFroiSRRNIGQAGKILADSGYQGLMKIYPQAQT 60 



Sbjct: ( 



Query: 121 GMINRELGF 129 

G+IN ELGF 
Sbjct: 121 GIINHELGF 129 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 488 

A DNA sequence (GBSx0526) was identified in S.agcdactiae <SEQ ID 1559> which encodes the amino 
acid sequence <SEQ ID 1560>. Analysis of this protein sequence reveals the following: 



i uncleavable N-term signal seq 



INTEGRAL 
INTEGRAL 

INTEGRAL 



Likelihood =■ 
Likelihood = -6.3: 
Likelihood = -2.7) 



Transmembrane 
Transmembrane 8 - 
Transmembrane 120 - 



97 ( 67 - 
24 ( 6 - 
136 ( 120 - 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



•- Certainty=0. 5394 (Affirmative! 
■- Certainty=0. 0000 (Not Clear) • 
- Certainty=0. 0000 (Not Clear) . 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04382 GB:AP001509 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 53/150 (35%) , Positives = 82/150 (54%) , Gaps = 1/150 (0%) 

Query: 1 MLNPYKRIPTLGLLATFLLPIFHFGRYSGLGTNLIEASFTNKNLYDYDWLLKLCLTVITL 60 

M N R F GL+ L +1 Y+G G +++E SFT +++ Y +L KL T +T+ 

Sbjct: 251 MKNHTTOAFVGGLIIVALTYIIGSYDYNGRGLDMLEDSFT-QDVPPYAFLAKLVFTAVTM 309 

Query: 61 AAGYQGGEVTPLFAIGASLGVIIAPILGLPVILVAALGYTSVFGSATNTLLGPILIGGEV 120 

G+ GGE PLF +GA+LG + + LP+ +AALG FG NT + L+G E+ 
Sbjct: 310 GMGWGGEAIPLFFVGATLGNTLHAFIDLPLSFLAALGMIVTFGGGANTPIAAFLLGVEM 359 

Query: 121 FGFANTPYFVI VCLVAYS ISHAHTIYGAQS 150 

F +F + CL +Y S H ++ +Q+ 

Sbjct: 370 FNGKGIEFFFVACLTSYLFSGHHGLKPSQT 399 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1561> which encodes the amino acid 
sequence <SEQ ID 1562>. Analysis of this protein sequence reveals the following: 



Possible site: 35 
> Seems to have no N-terminal signal sequence 
Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood 
Likelihood = 
Likelihood = 



72 



Transmembrane 337 - 353 

Transmembrane 264 - 280 

Transmembrane 167 - 183 

Transmembrane 223 - 23 9 

Transmembrane 20 - 36 

Transmembrane 102 - 118 

Transmembrane 3 00 - 316 



• Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



327 - 355; 



■ certainty=0. 5798 (Affirmative) ■ 

■ Certainty=0 . 0000 (Not Clear) < 1 
• Certainty=0. 0000 (Not Clear) < 1 



50 The protein has homology with the following sequences in the databases: 

>GP:BAB04382 GB:AP001509 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 129/397 (32%) , Positives = 210/397 (52%) , Gaps = 14/397 (3%) 

55 Query: 20 VLGLVGLALPIGGiAVGWDVIFGKGLLFLSEYRDHHLFLLLPFLALAGLVIVFLYDKLG- 78 

+L + + IG VG + L E R++ + +L FL LAGL + +LY K G 

Sbjct: 9 LLTWIFFGIMIGAIVGSATALLLTVNDHLGETRENRPVfFVL-FLPLAGLALGYLYMKAGT 67 

Query: 79 ---KEVRQGMGLVFQVGHGQKNQIPPMLIPLILFSTWVTHLFGASAGREGVAVQIGATIS 135 
60 E+ +G LV + G K ++ L PL+ T++T LFG S GREG A+Q+G +++ 

Sbjct: 68 SAGNELYKGNNLVIESVC^-KGKMLLRLGPLVYLGTFMTILFGGSTGREGAAIQMGGSVA 126 
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Query: 


136 


Sbjct: 


127 


Query: 


195 


Sbjct: 


187 


Query: 


253 


Sbjct: 


247 


Query: 




Sbjct: 




Query: 


371 


Sbjct: 


364 



I LL+ G++AGF F TPI A +F +E+ +G I 



R AF+G L+ + L +IG Y+G G +++ +F+ Q 4 Y +L K++ 



VTVISLSAGFQGGEOTPLFAIGASLGIVIAPYLGLPVLLVAALGYTTVFGSATNTFWAPI 370 

T +++ GF GGE PLF +GA+LG L ++ LP+ +AALG FG NT A 
FTAVTMGMGFVGGEAIPLFFv/GATLGmLHAFIDLPLSFLAALGMIVTFGGGANTPIAAF 363 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/147 (61%) , Positives = 111/147 (74%) 

Query: 3 NPYI01IFTLGIiLATFLLFIFHFGRYSGK3TNLIEASFTNKNLYDYDWLLICLCLTVITLAA 62 

NPY RI +G L + L I H GRYSGLGTNLI A+F+ + + YDWLLK+ +TVI+L+A 
Sbjct: 259 NPYFRIAFIGALLSICLMIGHVGRYSGLGTNLIAAAFSGQTILTYDWLLKMIVTVISLSA 318 

Query: 63 GYQGGEVTPLFAIGASLGVIIAPILGliPVILVAALGYTSVFGSATNTLLGPILIGGEvFG 122 

G+QGGEVTPLFAIGASLG+++AP LGIiPV+LVAALGYT+VFGSATNT PI IG EVFG 
Sbjct: 319 GFQGGEVTPLFAIGASLGIVIAPYLGLPVLLVAALGYTTVFGSATNTFWAPIFIGIEVFG 378 

Query: 123 FANTPYFVI VCLVAYS I SHAHT I YGAQ 149 

N + + AY +SH H+IY Q 
Sbjct: 379 PENALAYFVTSAAAYMVSHRHSIYSYQ 405 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 489 

A DNA sequence (GBSx0527) was identified in S.agalactiae <SEQ ID 1563> which encodes the amino 
acid sequence <SEQ ID 1564>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have a cleavable N-terra signal seq. 

INTEGRAL Likelihood = -8.65 Transmembrane 47 - 63 ( 45 - 70) 

INTEGRAL Likelihood = -5.04 Transmembrane 219 - 235 ( 208 - 237) 

INTEGRAL Likelihood = -3.35 Transmembrane 168 - 184 ( 168 - 187) 

INTEGRAL Likelihood = -0.48 Transmembrane 141 - 157 ( 141 - 157) 

Final Results 

bacterial membrane --- Certainty=0. 4461 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9317> which encodes amino acid sequence <SEQ ID 9318> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04382 GB:AP001509 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 75/223 (33%) , Positives = 119/223 (52%) , Gaps = 18/223 (8%) 
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Query: 17 FSLLIGGWGAITAVFGRVILFLTAFRSDYIAYLLPFLSIVGLFIVFVYQKFGGKS 72 

F ++IG +VG+ TA+ V L R + ++L FL + GL + ++Y K G + 
Sbjct: 15 FGIMIGAIVGSATALLLTVimHIGETRFJSIRPWFVL-FLPIAGLALGYLYMKAGTSAGNEL 73 

Query: 73 WGMGLVFEVGHGNEETIPKRLVPLVILTTWLTH1FGGSAGREGVAVQIGAOTSHYFQKY 132 

KG LV E G + + RL PLV L T++T LFGGS GREG A+Q+G +V+ K 
Sbjct: 74 YKGIMLVIESVQGKGKML-LRLGPLVYLGTFMTILFGGSTGREGAAIQMGGSVAEAVNKL 132 

Query: 133 CRLQNASQLFLVM-GMAAGFAGLFQTP1AATFFAIEVLWGRLMVSYVLPSLIAALTANF 191 

+++ L+M G++AGF F TP+ A F +E+ +GRL ++P L4A+ ++ 

Sbjct: 133 FKVKLIDTRILLMGGISAGFGAAFGTP-TAAIFGMENIASLGRLKFEALVPCLVASFVGHY 192 

Query: 192 VSHSLGLEKFSH SIATSMALTPDIILKLLVLGLCFGL 228 

+ EKF H IAT ++ K+++L + F L 

Sbjct: 193 TT EKFWHVEHE1CFI IATVPEVSALTFSKVILLAIVFSL 230 

There is also homology to SEQ ID 1562. 

A related GBS gene <SEQ ID 8577> and protein <SEQ ID 8578> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 9.66 
GvH: Signal Score (-7.5): -1.12 
Possible site: 27 





have a cleavable N-term signal seq. 










ALOM program 


count: 7 value: -10.99 threshold: 


0.0 








INTEGRAL 


Likelihood =-10.99 


Transmembrane 




344 


( 314 


354 


INTEGRAL 


Likelihood = -8.65 


Transmembrane 


47 


63 


45 


70 


INTEGRAL 


Likelihood = -6.32 


Transmembrane 


255 




253 


272 


INTEGRAL 


Likelihood = -4.41 


Transmembrane 


214 


230 


208 


238 


INTEGRAL 


Likelihood = -3.35 


Transmembrane 


168 


184 


16B 


187 


INTEGRAL 


Likelihood = -2.76 


Transmembrane 


367 


383 


367 


383 


INTEGRAL 
PERIPHERA 


Likelihood = -0.48 
L Likelihood = 0.42 


94 


141 


157 


141 


157 



modified ALOM score: 2.70 



Final Results 

bacterial membrane Certainty=0. 5394 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01989(349 - 1491 of 1794) 

GP|4512350|dbj |BAA75315.l| |AB011836 (15 - 399 of 424) similar to Bordetella paraperlussis 
transposase for insertion sequence element (27%-identity) {Bacillus halodurans} 
PIR|T44296]T44296 hypothetical protein [imported] - Bacillus halodurans 
%Match =15.4 

%Identity =33.4 %Similarity =54.7 

Matches = 129 Mismatches = 167 Conservative Sub.s = 82 

222 252 282 312 342 372 402 432 

MY*RKSKTII^TM*YEQLSKTL*QNLVFIKRRIL*TVIKRFDOT 

I -II =11= II- I I I 
MNKTFWLTLLTWIFFGIMIGAIVGSATALLLTVNDHLGETRE 



462 492 513 540 570 600 630 660 

DYIAYLLPFLSIVGLFIVFVYQKFG GKSV-KGMGLVFEVGHGNEETIPKRLVPLVILTTWLTHLFGGSAGREGVAVQ 

= ::| II : II : ::| I I I = I I I I I = I = = I I III I l = = l Hill III |:| 
NRPWFVL-FLPLAGIALGYLYMKAGTSAGNELYKGICMLVIESVQG-KGKMLLRLGPLVYLGTFMTILFGGSTGREGAAIQ 
60 70 80 90 100 110 120 
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690 720 747 777 807 837 867 894 

IGATVSHYFQKYCRLQNASQLFLWG-MAAGFAGLFQTPLAATFFAI3VLWGRLMVSYVLPSLIAALTANFVEHSL-GL 
:| :|: | ::: |:|| ::||| | ||: | | :|: :||| ::| |:|:: :: : : : 

MGGSVAEAVNECLFKVKLIDTRILLMGGISAGFGAaFGTPITARIFGMSMASLGRLKFFJUjVPCLVASFVGHYTTEKFWOT 
5 130 140 150 160 170 180 190 200 

924 954 984 1014 1041 1071 1101 1131 

EKFSHSIATSMALTPDIILKLLVLGLCFGLCGNLFAYLLAKA-KLIASSRLLIIPYKRIFTLGLIjATFIiLFIFHFGRYSGL 
I III = = l = = = l = 111=1 II = I I I 11= I =1 1 = 1 

10 EHEKFIIATVPEVSALTFSKVILIAIVFSLVSVLY^ 

210 220 230 240 250 260 270 280 

1161 1191 1221 1251 1281 1311 1341 1371 

GTNLIFASFTNKNLYDYDWLLKLCLTVITI^GYQGGEOTPLFAIGASLGVirAPILGLPVILVAALGYTSVFGSATNTL 
15 I ===| III === I =] || =| =1= |= III III =||=|| = = 11= ==ll|| II || 

GLDMLEDSFT-QDVPPYAFmKLVFTAVTMGMGFVGGFAIPLFFVGATLGNTLHAFIDLPLSFMAMMIVTFGGGAOTP 
290 300 310 320 330 340 350 

1401 1431 1461 1491 1521 1551 1581 1611 

20 LGPIL1GGEVFGFANTPYFVIVCLVAYSISHAHTIYGAQSR*LVMSFKRVYQFVERNIPFSFLFS*SL*KWSLSIL*MQK 
: |:| 1:| :| : M =1 I I == =1= 

370 380 390 400 410 420 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 490 

A DNA sequence (GBSx0528) was identified in S.agalactiae <SEQ ID 1565> which encodes the amino 

acid sequence <SEQ ID 1566>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3568 (Affirmative) < suco 

bacterial membrane Certainty=0 . 00 00 (Not Clear) < suco 

35 bacterial outside --- Certainty=0. 0000 (Wot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB98234 GB:U67480 chorismate mutase/prephenate dehydratase 
(pheA) [Methanococcus jannaschii] 
40 Identities = 26/85 (30%) , Positives = 46/85 (53%) , Gaps = 1/85 (1%) 

Query: 2 ELEEIRQEIDEIDQQLVSLLETRMGLILEVIAFKKKHRLPVIjDNNRENEVIJQNVLKKVQN 61 

+L EIR++IDEID +++ L+ R L +V K + +P+ D RE + + + K + 
Sbjct: 4 KLAEIRKKIDEIDNKILKLIAERNSLAKDVAEIKNQLGIPlKDPEREKriYDRIRKIiCKE 63 

45 

Query: 62 HQFDDVIRATFKDIMTE-SRVYQKE 85 

H D+ I 1+ E ++ QK+ 

Sbjct: 64 HNVDENIGIKIFQILIEHNKALQKQ 88 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1567> which encodes the amino acid 
sequence <SEQ ID 1568>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 2356 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



60 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 45/91 (49%) , Positives = 62/91 (67%) 

Query: 1 MELBEIRQEIDEIDQQLVSLLETRMGLILEVIAFKKKHRLPVLDimENEVIiNMVLKKVQ SO 

M LE+IRQEI+ ID LV+LLE RM L+ +V A+K + LPVLD REN++L+ V V+ 
Sbjct: 1 MRLEKIRQEINGIDHHLVMjLEKRMALVEQVTAYKIANHLPVLDQAREKQILDRVSYLVK so 

Query: 61 NHQFDDVIRATFKDIMTESRVYQKENIVDGD 91 

+ F+ I TFK IM+ SR YQ +++ GD 
Sbjct: 61 DQAFEPAIHETFKTIMSLSRQYQTQHLTGGD 91 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 491 

A DNA sequence (GBSx0529) was identified in S.agalactiae <SEQ ID 1569> which encodes the amino 
acid sequence <SEQ ID 1570>. This protein is predicted to be neuraminidase. Analysis of this protein 
sequence reveals the following: 

Possible site: 19 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.35 Transmembrane 28 - 44 ( 28 - 47) 

Final Results 

bacterial membrane Certainty=0 . 2338 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10191> which encodes amino acid sequence <SEQ ID 
101 92> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA51473 GB:X72967 neuraminidase [Streptococcus pneumoniae] 
Identities = 294/504 (58%), Positives = 380/504 (75%), Gaps = 10/504 (1%) 



Sbjct: 299 EEVQKRSQLFKRSDLEKKLPEGAALTEKTDIFESGRRGKPNKDGIKSYRIPALLKTDKGT 3 
Query: : 



Query: 423 NMDMALVQDTSSKTKKIFSIYDMFPEGRGVISIANTPEKEYTQIGGQSYliNLYNNGKKSK 482 

N+DM LVQD +TKRIFSIYDMFPEG+G+ +++ E+ Y +1 G++Y LY G+K 
Sbjct: 415 NIDMVLVQDP--ETKRIFSIYDMFPEC-KGIFGMSSQKEERYKKIDGKTYQILYREGEKG- 471 

Query: 483 VFTIRDKGIVYNFKGKKTDYHVITETTKSDHSNLGDIYKGKQLLGNIYFTKHKTSPFRLA 542 

+TIR+ G VY GK TDY V+ + K +S+ GD+YKG QLLGNIYFT +KTSPFR+A 
Sbjct: 472 AYTIRENGTVYTPDGKATDYRVWDPVKPAYSDKGDLYKGNQLLGNIYFTTNKTSPFRIA 531 

Query: 543 KSSYVWMSYSDDDGRTWSSPRDITASLRQKGMKFLGIGPGKGIVLKWGPHAGRIIIPAYS 602 

K SY+WMSYSDDDG+TWS+P+DIT ++ MKFLG+GPG GIVL+ GPH GRI+IP Y+ 
Sbjct: 532 KDSYLWMSYSDDDGKTWSAPQDITPMVK&DWMKFLGVGPGTGIVLKNGPHKGRILIPVYT 591 



Sbjct: 592 TNOTSHLNGSQSSRIIYSDDHGKTl'^.GE^A\iroNRQV-DGQKIHSSTMNNRRAQNTESTV 650 

Query: 663 VQLKNGDIKLFMPJ^TGNLEVATSKDGGET>IQNHVKRYKEVHDAYVQLSAIRFEHDKKEY 722 

VQL NGD+KLFMR LTG+L+VATSKDGG TW+ +KRY +V D YVQ+SAI H+ KEY 
Sbjct: 651 VQITOGDVKLFMRGLTGDLQVATSKDGGVTWEKDIKRYPQVKDVYVQMSAIHTMHEGKEY 710 
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Query: 723 IlSLVmmGPGY^QXXSYmiAQmmGSFmhYKmiQDGSFAYNSVQQlM^KFGVhYE 782 

I+L NA GP KR++G LA+V NG WL H+ IQ G FAYNS+Q+L N ++G+LYE 
Sbjct: 711 IILSNAGGP- -KRENGrWHLARVEENGELTWLKHNPIQKGEFAYKSLQELGNGEYGILYE 768 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and (heir epitopes could be useful antigens for 



vaccines or 



Example 492 

A DNA sequence (GBSx0530) was identified in S.agalactiae <SEQ ID 1571> which encodes the amino 
acid sequence <SEQ ID 1572>. This protein is predicted to be unnamed protein product (gatC). Analysis of 
15 this protein sequence reveals the following: 



i uncleavable N-term signal seq 



Likelihood =-12 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



Transmembrane 154 



■ 170 ( 
• 119 ( 
37 ( 



Transmembrane 
Transmembrane 
Transmembrane 



149 - 178! 



448 - 464 ( 444 - 



Transmembrane 3 56 - 



{ 352 - 373! 



376 - 392 ( 375 - 



- Certainty=0. 6052 (Affirmative) 
bacterial outside --- Certainty=0. 0000 (Not Clear) < e 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) <= £ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1573> which encodes the amino acid 
35 sequence <SEQ ID 1574>. Analysis of this protein sequence reveals the following: 



Possible site: 35 
leems to have an uncleavable N- 
Likelihood = 
Likelihood -- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood » 
Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



term signal seq 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 170 ( 150 - 

- 120 ( 99 - 

- 463 ( 442 - 

- 38 ( 11 - 

- 393 ( 375 - 

- 64 ( 46 - 

- 347 ( 329 • 

- 373 ( 353 - 

- 294 ( 276 - 

- 256 ( 240 - 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 5925 (Affirmative) < s 

- Certainty=0. 0000 (Not Clear) < sue 

- Certainty=0. 0000 (Not Clear) < sue 



An alignment of the GAS and GBS proteins is shown below: 

55 Identities = 419/482 (86%) , Positives = 447/482 (91%) 



Query: 1 mqvfmivnkffdpiihmgsgvvmlivmtgi^iifgwfskaleggiklaialtgigaii 60 

MQ FL+I+NK I +GSGWMLIVMTGLAMI FGVKF+KALEGGI KLAIALTGIGAI I 
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Sbjct: 2 MQPFLDIINKILGFPIQLGSGWMLIVMTGLAMIFGVKFTKftLEGGIKLAIALTGIGAII 61 

Query: 61 GILTGAFSKSLQAFVKNTGINLSIIDVGWAPLATITWGSPYTLYFLLIMLIVWIVMIVMK 120 

GILTGAFSESLQAFVKNTGI+L+I1DVGWAPLATITWGSPVTLYFLL+ML+VNIVMIVMK 
Sbjct: 62 GILTGAFSESLQAFVKNTGISLNIIDVGWAPLATITWGSPYTLYFLLVMLWNIVMIVMK 121 

Query: 121 KTDTLDVDIFDIVfflLSITGLLIMlWAKKNNLPTLLSVIIArVAIIFVGVLKIINSDLMKP 130 

KTDTLDVDIFDIWHLSITGLLIMWYA +N+LP +S++IATVA+I VGVLKI INSDLMKP 
Sbjct: 122 KTDTLDVDIFDIWHLSITGLLI^WYAAROT^LPVFVSLLIATVAVILVGVLKIINSDLMKP 181 

Query: 181 TFDDLLGTGPTSPMTSTHMNYIMPI1IWLDKLFDKVFPGLDKYDFDAAKLNKAIGFWGS 240 

TFDDLLGTGP SPMTSTHMNYMMNPIIMVLDK+FDKVFPGLDKYDFDAAKtNK IGFWGS 
Sbjct: 182 TFDDLLGTGPQSPMTSTHMKTYMMNPIIMVLDKIFDKVFPGLDKYDFDAAKLNKKIGFWGS 241 

Query: 241 KFFIGMILGLVIGIMGNPVFSFAALGGWFSLGFTAGACLELFSLIGSWFIAAVEPLSQGI 300 

KFFIGM LG VIGIMG+P F+ +4- WF LGFTAGACLELFSLIGSWFIAAVEPLSQGI 
Sbjct: 242 KFFIGMALGFVIGIMGDPHFTVESIKNWFGLGFTAGACLELFSLIGSWFIAAVEPLSQGI 301 

Query: 301 TNFAMGKMHGRRFNIGLDWPFIAGRAEIWACANILAPIMLVEAILLSKVGNGILPLAGII 350 

TNFAN +MHGRRFNIGLDWPFIAGRAEIWACANIIAPIML+EA+LLSKVGNGILPLAGII 
Sbjct: 302 TNFANARMHGRRFNIGLDWPFIAGRAEIWACANILAPIMLIEAVLLSKVGNGILPLAGII 361 

Query: 361 AMGVTPALLWTRGRLIRMITFGTLLLPLFI.L.SGTMIAPFATELAKKVGAFPAGARAGSL 420 

AMG+TPALLWTRGRLIRMI FG+LLLPLFLLSGTMIAPFATELAKKVGAFPAG AGSL 
Sbjct: 362 AMGMTPALLWTRGRLIRMI I FGSLLLPLFLLSGTMIAPFATELAKKVGAFPAGTSAGSL 421 

Query: 421 ITHSTLEGPMEKIFGYVIGKATTGQLSAIITLIIFATAYLGLFMWYAKQMKRRNAEYAAN 480 

ITHSTLEGPMEKI FGYV IG+ATTGQ+++I ITIil IF Yt. LF WYA QMK RNAEYA 
Sbjct: 422 ITHSTLEGPMEKIFGYVIGQATTGQIASIITLIIFVAIYLSLFAWYANQMKARHAEYAKT 481 

Query: 481 QK 482 
K 

Sbjct: 482 MK 483 



35 A related GBS gene <SEQ ID 8579> and protein <SEQ ID 8580> were also identified, 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrira Score: 4.31 
GvH: Signal Score (-7.5): -2.64 
40 Possible site: 34 

>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 6 value: -12.63 threshold: 0.0 
Likelihood = 
Likelihood =-11. S 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Transmembrane 21 - 



170 ( 149 - 178) 

119 ( 98 - 123) 

37 ( 14 - 40) 

- 63 ( 45 - 68) 

- 259 ( 235 - 265) 

- 284 ( 268 - 284) 



85 



127 



*** Reasoning Step: 3 



Final Results 

55 bacterial membrane --- Certainty=0. 6052 (Affirmative) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

60 ORF00838(343 - 1122 of 1455) 

EGAD|91348|EC2092(9 - 344 of 451) PTS system, galactitol specific IIC component 
{Escherichia coli} OMNI |NT01EC2494 PTS system galactitol-specif ic enzyme IIC component 
SP|P37189|PTKC_ECOLI PTS SYSTEM, GALACTITOL-SPECIFIC IIC COMPONENT (EIIC-GAT) (GALACTIC0L- 
PERMEASE IIC COMPONENT) (PHOSPHOTRANSFERASE ENZYME II, C COMPONENT) . 

65 GP| 1736809 | dbj |BAA15955.l| |D90847 PTS system, Galactitol-specif ic lie component (EIIC-GAT) 
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(Galactitol- permease IIC component) (Phosphotransferase enzyme II, C component) . 
{Escherichia coli} GP| 17884 
%Match =10.9 

%Identity =29.8 %Similarity =59.2 
5 Matches = 68 Mismatches = 88 Conservative Sub.s = 67 

282 312 342 372 402 432 462 492 

LS*HI*NWN*S*RRRTOC!vFIjNIVNKFFDPIIHMGSGVA14LIVM^ 

I: :| h= 11 = = = 1 = 1 = 1 == 1= = I = III =ll = = 
10 MFSEVMRYILDLGPTVMLPIVIIIFSKILGMKAGDCFKAGLHIGIGFVGIGLVIGLML 
10 20 30 40 50 

522 552 582 512 542 672 702 

GAFSESLQAFVTCOT'Giro^SIIDVGWAPLATITWGSPYTLYFLLIKLIVTIIvMIvMKKTDTLDvIlIFDIWHLSITGLLIM^ 
15 : : :| :] :|| ::|||| = :|| | | : | ::||: |:: : | ::|||::|||:= || |: 

DSIGPAAKAMAENFDimHVVDVGWPGSSPMTWASQIA™ 

70 80 90 100 110 120 130 

747 774 804 834 864 894 

20 WYAKKN-NLPTLIiSVI IATVAI I FVGVTiKI INSDLMKPTFDDLLGTGPTSPMTSTH 

1=1= 1= I = I ==l 

ATGSVIMIGMAGWIHAAFVYKLGDWFARDTRNFFELEGIAIPHGTSAYMG 

150 160 170 180 

25 924 954 984 1014 1044 

MNYM^PIII»IVljDKLFDKOTPGLDKYDFDAAKliNKAIGFWGSKFFIGMILGLv'IXIM 

II :==] = =1= ||::: I I = = I =1 =1 ==ll=l 1= 

PIAVTjVDAIIEKI-PGVNRIKFSADDIQRKFGPFGEPVTVGFVMGLIIGIIAGTO 

200 210 220 230 240 250 

30 

1092 1122 1152 1182 1212 1242 

- GNPWSFASIRWLVFFFVL05GACLEVGLF*LVSWQLLQ*NHFLRKLLILLMVNAXX* 

111= I = =11 = 

-WSASLIFIPLTILIAVCVPGNQVLPFGDLATIGFFVAt-lAVAVHRGKLFRTLISGVIIMSITLWIATQTIGLHTQIAAN 
35 320 330 340 350 360 370 380 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 493 

A DNA sequence (GBSx0531) was identified in S.agalactiae <SEQ ID 1575> which encodes the amino 
40 acid sequence <SEQ ID 1576>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 0302 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 1577> which encodes the amino acid 
sequence <SEQ ID 1578>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 0302 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



60 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 85/100 (85%), Positives = 96/100 (96%) 

Query: 1 MIKILAACQAGWSSHQIKDAIETQLGDRGYNVHCnDAVMVKDITEEMVNKyDIPTPIAKT 6 0 

MIKILAACGAGVNSSHQIKDAIETQ+ DRGY+VHCDAVMVKDITEE+V++YDIPTPIAKT 
Sbjct: 1 MIKILAACGAGWSSHQIKDAIETQMSDRGYDVHCDAWIVKDITEELVSRYDIPTPIAKT 6 0 

Query: 61 DLGFNVPIPWEAGPILYRIPVMSEPVFXAIjEQVIKEHNIj 100 

DLGF +PIP+VEAGPILYRIP+MSEPVF 1E+VIKE++L 
Sbjct: 61 DLGFEMPIPIVEAGPILYRIPIMSEPVFAELERVIKENHL 100 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 494 

A DNA sequence (GBSx0532) was identified in S.agalactiae <SEQ ID 1579> which encodes the amino 
acid sequence <SEQ ID 1580>. This protein is predicted to be GatA. Analysis of this protein sequence 
reveals the following: 

no N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 2078 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 93> which encodes amino acid sequence <SEQ ID 
10194> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 16 QEELFDLVSKALIKQHYVSPNYRQAVKEREREFPTGLKIDLKDGTPIQYVAIPHTETQYC 75 

Q L +++S+ L+++ YV + +A+ +RE+++PTGL+++ VAIPHT ++Y 

Sbjct: 20 QTNLLEVLSQYLLQKGYVKTEFSKAILQREKDYPTGLQLE NMAVAI PHTYSEYV 73 

Query: 76 LVDRIFYVKNSQPITFKHMINPEEECRVQDFFFIINSRN-SNQSDILSNLITFFITKGNL 134 

L 1+ K 4PI+F M E+E + + ++ N +Q+ +L+ L+T F + 
Sbjct: 74 LKPFIYINKLKEPISFIQM-GTEDEIVMARYVIVLGISNPKDQAGLLAELMTLFSNPKIV 132 

Query: 135 DRLHELGDNKEKINH 149 

+L E4 KE + + 
Sbjct: 133 QQL - EMAQTKEALKN 146 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1581> which encodes the amino acid 
sequence <SEQ ID 1582>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3130 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/154 (66%) , Positives = 122/154 (78%) 
Query: 4 OTQDILFIDAHSQEELFDLVSKALIKQHWSPNYRQAVKEREREFPTGLKIDLKDGTPIQ 63 
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Query: 64 YVAIPHTETQYCLVDRIFYVKNSQPITFKHMINFEEECRVQDFFFIINSRHSNQSDILSN 123 

Y AIPHTET+YCLVD++ YV+NSQ +TFKHMINPEE+C V DFFFIINS+N Q+ ILSN 
Sbjct: 61 YAAIPHTETKYCLVDQWYi^NSQALTFI-MMINPEEDCLVTDFFFIINSQNEGQTTILSN 120 

Query: 124 LITFFITKGNLDRLHELGDNKEKINHYLIEKGVF 157 

LITFFITKGNL L L D4-K+ I++YLIEKGVF 
Sbjct: 121 LITFFITKGNriSYIASLKDDKQAISNYLIEKGVF 154 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 495 

A DNA sequence (GBSx0533) was identified in S.agalactiae <SEQ ID 1583> which encodes the amino 
acid sequence <SEQ ID 1584>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



20 Final Results 

bacterial cytoplasm --- Certainty=0 . 1429 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database: 





1 


Sbjct: 


7 




61 


Sbjct: 


67 




118 


Sbjct: 


124 


Query: 


178 


Sbjct: 


184 


Query: 


238 


Sbjct: 


244 



^ T EK++IA+ A I DG+TIFIGPGTTL +LA +L 



+KIRV+TNSLPVF ILW S T+DL+L+GGEYREITGAFVGS+ ++K++ F+KAFV +N 



SIATY + EG IQ++ALNN+ EKFLLVDS KF +YDF+ FY 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1585> which encodes the amino acid 
sequence <SEQ ID 1586>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 0740 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 161/252 (63%), Positives = 195/252 (76%), Gaps = 3/252 (1%) 

Query: 1 MLKRERLQKIIEKVNINGIVTvNEIMEELDVSnMTV^ 60 

MLKRERL KI E VN GIVTVN+I++ L+VSDMTVRRDLDEL+KAG LIRIHGGAQ + 
Sbjct: 1 MLKRERLLKITEIV^QGIVTVNDIIQTLNVSDMTVRRDLDELEKRGIOjlRIHGGAQSIT 60 

Query: 61 ASPTPQNYEKSNTEKYDIQTMEKLEIAQFAKQFINDGETIFIGPGTTLEKLATQLLDFKI 120 

P E+SN EK +QT EK E+A +A Q +NDGETI FIGPGTTLE A QL + +1 
Sbjct: 61 M PNKKERSNIEKQTVQTKEKWELASYATQLVNDGETIFIGPGTTIiECFAEQLKNRQI 117 

Query: 121 RVVTNSLPVFNIIiNQSSTIjDLILVGGEYREITGAFVGSvTINSIKSLNFSKAFVSSNGVF 180 

R+VTNSLPVFNIL S T+DML+GGEYR ITGAFVGS+ +1 SL F+KAF+S NG++ 
Sbjct: 118 RIVTOSLPVFNILQDSETIDLILIGGEYRSITGAFVGSLASQNISSLKFAKAFISCNGIY 177 

Query: 181 EKSIATYDEGEGEIQRlALNNSFEKFLjVDSQKFGKYDFYTFyQLDDIDFVLTDHNIDNV 240 

+ IATY E EGEIQ++A NWS EK+LLVD+QKF YDF+ FY L++ID V+TD I 
Sbjct: 178 E^IATYSETEGEIQKLAFNNSIEKYLLVDNQKF^YDFFIFYHLIMIDAVVTDSQITED 237 

Query: 241 VKEQYSSFTKIL 252 

V E+YS FT++L 
Sbjct: 238 VIERYSQFTQLL 249 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 496 

A DNA sequence (GBSx0534) was identified in S.agalactiae <SEQ ID 1587> which encodes the amino 
acid sequence <SEQ ID 1588>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3436 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty-0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 11 DLSESELKAAQEFLSGKSEANQDKPKTGKTAQEIYEAIEPKAIVKPEDLLFGIAQATDYK 70 

DL++ + L K D TG IEP+ V L AT 

Sbjct: 526 DLTQIAFAEQELMLKDKKHYRYDIVDTG IEPRLAVDVSSLPMHAGNATYDT 576 

Query: 71 NGTFVIPHKDHYHTVELKWFDEEKDLIADSDKTYSLEDYIATAIOTMMHPEKRPKVEGWG 130 

+FVIPH DH H V W + +AT KY M HPE RP V W 

Sbjct: 577 GSSFVI PHIDHIHWPYSWIiTRNQ IATIKYVMQHPEVRPDV- -WS 619 

Query: 131 KDAEIYKEKDSNKADKPSPAPTDNKSTSN3SDKNLSAAEVFKQAKPEKIVPLDKIAAHMA 190 

K + + + P+ P D ++ + SA EV +K + + AA 
Sbjct: 620 KPGH EESGSVI PNVTPLDKRAGMPNWQ1 IHSAEEV QKALAEGRFAA 665 



Sbjct: 666 PDGYIFDPRDVLAKETFWKDGSFSIPRADGSSLRTIKKSDLSQAEWQQAQELL 719 

Query: 249 KEKGWGHDSDHNKGSNKDNKAKNYAPDEEPEDSGKVTHNYGFYDVNKGSDEEEP-EKQED 307 

+K G +D +K P+E+ + +K ++ ++P E ++ ' 

Sbjct: 720 AKKNAGDATDTDK PEEKQQ ADKSNENQQPSEASKE 754 

Query: 308 ESELDEYELG^QNAKKYGIVJDRQSFEKQLIQLSNKYSVSFESFNYINGSQVQVTKKDGSK 367 

E E D++ + YG+DR + E + QL+ K ++ + VQ K+G 

Sbjct: 755 EKESDDF IDSLPDYGLDRATLEDHINQLAQKANID-PKYLIFQPEGVQFYNKNGEL 809 
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Query: 368 VLVDIKTLTEV 378 

V DIKTL ++ 
Sbjct: 810 VTYDIKTLQQI 820 

5 

A related DNA sequence was identified in S.agalactiae <SEQ ID 6983> which encodes the amino acid 
sequence <SEQ ID 6984>. Analysis of this protein sequence reveals the following: 
Possible site: 26 

»> Seems to have an uncleavable N-term signal seg 

10 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

A related GBS gene <SEQ ID 8581> and protein <SEQ ID 8582> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 2 
McG: Discrim Score: 6.06 
20 GvH: Signal Score (-7.5): -5.61 

Possible site: 26 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 0 value: 2.23 threshold: 0.0 
PERIPHERAL Likelihood =2.23 6 
25 modified ALOM score: -0.95 

*** Reasoning Step: 3 

Final Results 

30 bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1589> which encodes the amino acid 
35 sequence <SEQ ID 1590>. Analysis of this protein sequence reveals the following: 
Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 808/825 (97%) , Positives = 816/825 (97%) , Gaps = 3/825 (0%) 

Query: 2 KKTYGYIGSVAAILLATHIGSYQLGKHHMGLATKDNQIAYIDDSKGKVKAPKTNKTMDQ 50 

KKTYGYIGSVAAILLATHIGSYQLGKHHMG ATKDNQIAYIDDSKGK KAPKTNKTMDQ 
Sbjct: 2 KKTYGYIGSVAAILLATHIGSYQLGKHHMGSATKDNQIAYIDDSKGKAKAPKTNKTMDQ 60 

Query: 61 ISAEEGISAEQIWKITDQGYVTSHGDHYHFYNGKVPYDAIISEELLMTDPNYHFKQSDV 120 

ISAEEGISAEQIWKITDQGYVTSHGDHYHFYNGKVPYDAIISEELLMTDPNY FKQSDV 
Sbjct: 61 ISAEEGISAEQIWKITDQGYVTSHGDHYHFYNGKVPYDAIISEELLMTDPNYRFKQSDV 120 

Query: 121 INEILDGYVIKWGNYYVYLKPGSKRKNIRTKQQIAEQVAKGTKEAKEKGLftQVAHLSKE 180 

INEILDGYVIKVNGNYYvYLOTGSKRKHIRTKQQIAEQVAKGTKEAKEKGLAQVAHLSKE 
Sbjct: 121 INEILDGWIKVNGNYYVYLKPGSKRKNIRTKQQIAEQVAKGTKEAKEKGLAQVAHLSKE 180 

Query: 181 EVAAVNEAKRQGRYTTDDGYIFS PTD 1 1 DDLGDAYLVPHGNHYHYI PKKDLSPSELAAAQ 240 

EVAAVNFjAKRQGRYTTDDGYIFSPTDIIDDLGDAYLVPHGNHYHYIPKKDLSPSELAAAQ 
Sbjct: 181 EVAAVNFAKRCjGRYTTDDGYIFSPTDIIDDLGDAYLVPHGNHYHYIPKKDLSPSELAAAQ 240 
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Query: 241 AYWSQKQGRGARPSDYRPTPAP--GRRKAPIPDTOPNPGQGHQPDNGGYHPAPPRPNDAS 298 

AYWSQKQGRGARPSDYRPTPAP GRRKAPI PDVTPNPGQGHQPDNGGYHPAPPRPNDAS 
Sbjct: 241 AYWSQKQGRGARPSDYRPTPAPAPGRRKaPIPDVTENPGQGHQPDNGGYHPAPPRPNDAS 300 

Query: 299 QNKHQRDEFKGKTFKELLDQLHRLDLKYRHVEEDGLIFEPTQVIKSNAPGYWPHGDHYH 358 

QNKHQRDEFKGKTFKELLDQLHRLDLKYRHVEEDGLIFEPTQVIKSNAFGYWPHGDHYH 
Sbjct: 301 QNKHQRDEFKGKTFKELLDQLHRLDLKYRHVEEDGLIFEPTQVIKSNAFGYWPHGDHYH 3 SO 

10 Query: 359 IIPRSQLSPLEMELADRYIAGQTDDNDSGSDHSKPSDKEVTHTFLGHRIKAYGKGLDGKP 418 

IIPRSQLSPLEMELADRYLAGQT+D+DSGSDHSKPSDKEVTHTFLGHR1KAYGKGLDGKP 
Sbjct: 361 1 1 PRSQLSPLEMELADRYLAGQTEDDDSGSDHSKPSDKEVTHTFLGHR1 KAYGKGLDGKP 420 



Query: 419 YDTSDAYVFSKESIHSVDKSGVTAKHGDHFHYIGFGELEQYELDEVANWVKAKGQADELV 478 
1 5 YDTSDAYVFSKESIHSVDKSGVTAKHGDHFHYIGFGELEQYELDEVANWVKAKGQADEL 

Sbjct: 421 YDTSDAYVFSKESIHSVDKSGVTAKHGDHFHYIGFGELEQYELDEVANVIVKAKGQADELA 480 



Query: 479 AALDQECGKEKPLFDTKKVSRKVTiCDGKVGYIKPKDGKDYFYARYQLDLTQIAFAEQELM 538 

AALDQEQGKEKPLFDTKKVSRKOTKDGKVGY+MPKEGKDYFYAR QLDLTQIAFAEQELM 
Sbjct: 481 AALDQEQGKEKPLFDTKKVSRKVTKDGKVGYMMPKDGKDYFYARDQLDLTQIAFAEQELM 540 

Query: 539 LKDKKHYRYDIVDTGIEPRLAVDLSSLPMHAGNATYDTGSSFVIPHIDHIHWPYSWLTR 598 

LKDKKHYRYDIVDTGIEPRLAVD+SSLPMHAGNATYDTGSSFVIPHIDHIHWPYSWLTR 
Sbjct: 541 LKDKKHYRYDIVnTGIEPRLAVDVSSLPMHAGNATYDTGSSFVIPHIDHIHWPYSWLTR 600 



Query: 599 
Sbjct: 601 

Query: 659 AEGRFARPDGYIFDPRDVLAKETFVWKDGSFSIPRADGSSLRTINKSDLSQAEWQQAQEL 718 

AEGRFA PDGYIFDPRDVLAKETFVWKDGSFSIPRADGSSLRTINKSDLSQAEWQQAQEL 
Sbjct: 661 AEGRFATPDGYIFDPRDVLAKETFVWKDGSFSIPRADGSSLRTINKSDLSQAEWQQAQEL 720 

Query: 719 LAKKNAGDATDTDKPEEKQQADKSNSNQQPSEASK-EEKESDDFIDSLPDYGLDRATLED 777 

LAKKNAGDATDTDKP+EKQQADKSNENQQPSEASK EEKESDDFIDSLPDYGLDRATLED 
Sbjct: 721 LAKKNAGDATDTDKPKEKQQADKSN3NQQPSEASKEEEKESDDFIDSLPDYGLDRATLED 780 

Query: 778 HINQLAQKANIDPKYL1 FQPEGVQFYNKNGELVTYDI KTLQQINP 822 

HINQLAQKANIDPKYLIFQPEGVQFYNKNGELVTYDIXTLQQINP 
Sbjct: 781 HINQLAQKANIDPKYL1FQPEGVQFYNKNGELVTYDIKTLQQINP 825 

SEQ ID 8582 was expressed in E.coli in two different forms. GBS293dNterm was expressed in E.coli as a 
GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 147 (lane 14; MW 74kDa 
+ lanes 17 & 18; MW 48.8kDa). GBS293C was expressed in E.coli as a GST-fusion product. SDS-PAGE 
analysis of total cell extract is shown in Figures 148 (lane 2-4; MW 71kDa + lanes 5 & 7; MW 46kDa) and 
182 (lane 7; MW 46kDa). Purified GBS293C-His is shown in Figure 241, lanes 8& 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 497 

50 A DNA sequence (GBSx0535) was identified in S.agalactiae <SEQ ID 1591> which encodes the amino 
acid sequence <SEQ ID 1592>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have a cleavable N-terrn signal seq. 

55 . Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=G . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 



KKTV- 1 ISALSVALFGTGVGAYQLGSYNA- - QKSDNSVSWKTDKSDSKAQATAVNKTPD 6 0 
KKT I +++ L T +G+YQLG ++ DN ++Y+ D S K +A NKT D 

KCTYGYIGSVAAILIATHIGSYQLGKHHMGLATKDNQIAYI--DDSKGKVKAPKTNKTMD 59 



QIS EEGISAEQIWKITD GYVTSHGDHYH+YNGKVPYDAIISEEL+M DP+Y F ++D 



VINE+ DGY+IKVNG YY+YLK GSKR N+RTK+QI +Q 



Query: 


4 


Sbjct: 


2 


Query: 


61 


Sbjct: 


60 


Query: 


121 


Sbj ct: 


120 


Query: 


181 


Sbjct: 


175 


Query: 


237 


Sbjct: 


235 




295 


Sbjct: 


289 


Query: 


355 


Sbjct: 


345 




414 


Sbjct: 


395 



EL+AAQAYW++K GR 



+SPLE+E A R +A 



There is also homology to SEQ ID 1590. 

SEQ ID 1592 (GBS94) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 3; MW 52.5kDa). 

40 GBS94-His was purified as shown in Figure 194, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 498 

A DNA sequence (GBSx0536) was identified in S.agalactiae <SEQ ID 1593> which encodes the amino 
45 acid sequence <SEQ ID 1594>. This protein is predicted to be Lmb. Analysis of this protein sequence 
reveals the following: 

Possible site: 24 

»> May be a lipoprotein 



- Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



55 There is also homology to SEQ IDs 1596 and 5548. 
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A related GBS gene <SEQ ID 8583> and protein <SEQ ID 8584> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 22 Crend: 5 
McG : Discrim Score : 13 . 64 

5 GvH : Signal Score (-7.5): -5.75 

Possible site: 24 
>i> May be a lipoprotein 

ALOM program count: 0 value: 4.83 threshold: 0.0 
PERIPHERAL Likelihood =4.83 33 
10 modified ALOM score: -1.47 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8584 (GBS22) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 6; MW 35kDa). 

The GBS22-His fusion product was purified (Figure 94A; see also Figure 193, lane 4) and used to immunise 
20 mice (lane 2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 94B), FACS 
(Figure 94C ), and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

SEQ ID 8584 (GBS22) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 183 (lane 7 & 8; MW 35kDa). 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 499 

A DNA sequence (GBSx0537) was identified in S.agalacticte <SEQ ID 1597> which encodes the amino 

acid sequence <SEQ ID 1598>. Analysis of this protein sequence reveals the following: 

30 Possible site: 39 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 19 - 35 ( 19 - 35) 

Final Results 

35 bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 >GP:CAA51352 GB:X72832 ORF1 [Streptococcus equisimilis] 

Identities = 104/145 (71%), Positives = 126/145 (86%) 

Query: 1 MKI I IQRVNQASVSIEDDWGSIEKGLVLLVGIAPEDTTEDIAYAVRKITSMRI FSDDEG 60 
MK+4+QRV 4ASVSI+ + G+I +GL+LLVG+ P+D ED+AYAVRKI +MRIFSD +G 
45 Sbjct: 1 MKLVLQRVKEASVSIDGKIAGAINCGLLLLVGVGPDDAAEDLAYAVRKIVNMRIFSDADG 60 

Query: 61 KMNLSIQDIKGSVLSISQFTLFADTKKGNRPAFTGAADPVKANQFYDIFNQELANHVSVE 120 

KMN SIQDIKGS4LS+SQFTL+ADTKKGNRPAFTGAA P A+QFYD FN++LA+ V VE 
Sbjct: 61 KMNQSIQDIKGSILSVSQFTLYADTKKGNRPAFTGAAKPDMASQFYDRFNEQLADFVPVE 120 

50 

Query: 121 TGQFGADMQVSLINDGPVTIVLDTK 145 

G FGADMQVSLINDGPVTI+LDTK 
Sbjct: 121 RGVFGADMQVSL1NDGPVTI ILDTK 145 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1599> which encodes the amino acid 
sequence <SEQ ID 1600>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1430 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 103/145 (71%) , Positives = 124/145 (85%) 

15 Query: 1 MKI I IQRvNQASVSIEDDWGS IEKGLVLLVGI APEDTTEDIAYAVRKITSMRI FSDDEG SO 

MK+++QRV +ASVSI+ + G+I -W3L+LLVG+ P+D ED+AYAVRKI +MRIFSD +G 
Sbjct: 1 MKLVLQRVKEASVS IDGKI AGAINQGLLLLVGVGPDDNAEDIAYA VRKI VNMRI FSDADG 60 

Query: 61 KMNLSIQDIKGSVIjSISQFTLFADTKKGNRPAFTGAADPvKANQFYDIFNQELANHVSVE 120 
20 KMN SIQDIKGS+LS+SQFTL+ADTKKGNRPAFTGAA P A+Q YD FN++LA V VE 

Sbjct: 61 KMNQS IQD IKGS I LSVSQFTLYADTKKGNRPAFTGAAKPDLASQLYDS FNEQLAEFVPVE 120 

Query: 121 TGQFGADMQVSL1NDGPVTIVLDTK 145 
G FGADMQVSLINDGPVTI+LDTK 
25 Sbjct: 121 RGVFGADMQVSLINDGPVTI ILDTK 145 

SEQ ID 1598 (GBS368) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 4; MW 20kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 70 (lane 4; MW 45kDa). 

30 GBS368-GST was purified as shown in Figure 215, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 500 

A DNA sequence (GBSx0538) was identified in S.agalactiae <SEQ ID 1601> which encodes the amino 
acid sequence <SEQ ID 1602>. This protein is predicted to be stringent response-like protein (rel) (relA). 
Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 60 - 76 ( 60 - 76) 

Final Results 

bacterial membrane Certainty=0 . 112 B (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA51353 GB:X72832 stringent response-like protein 
[Streptococcus equisimilis] 
Identities = 647/739 (87%) , Positives = 696/739 (93%) , Gaps - 1/739 (0%) 

Query: 1 MVKEINLTGEEWAITSQYMSETDVAFVKFALNYATAAHYYQARKSGEPYIIHPIQVAGI 60 

M KEINLTGEEWA+ ++YM+ETD AFVK AL+YATAAH+YQ RKSGEPYI +HPIQVAGI 
Sbjct: 1 MAKEINLTGEEWAIAAKYMNETDAAFWKALDYA^ 60 



40 



45 
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IYKSHEEQL 1 
iTKSHEEQL 

: 61 LADLHLDAVTVACGPLHDVVEDTDITLDNIEFDFGKDWDIVDGVTKLGKVEYKSHEEQL 1 



Query: 121 AENHRKMLMAMSKDIRVILVKIiADRLHNMRTLKHIJ^IGDKQERISE 

AENHRKMLMAMSKDIRVILVKIiM>RLHNmTLKH^^ 
Sbjct: 121 AENHRK^MAMSKDIRVILVKLADRLHJMRTLKHLRKDKQERISRETMEIYAPLRHRLGI 180 

Query: 181 SRIKWELEDLSFRYLNETEFYK1SHMMSEKRREREELVDIIVDKIRSYTEEQGLYGDIYG 240 

SRIKWELEDL+FRYLNETEFYKISHMM+EKRRERE LVD IV KE+SYT EQGL+GD+YG 
Sbjct: 181 SRIKWELEDLAFRYLNETEFYK1SHMMNEKRREREALVDDIVTKIKSYTTEQGLFGDVYG 240 

Query: 241 RPKHIYSIYRKMRDKKKRFDQIYDLIAIRCIMETASDVYAMVGYIHELWRPMPGRFKDYI 300 

RPKHIYSIYRKMRDKKKRFDQI+DLIAIRC+MET SDVYAMVGYIHELWRPMPGRFKDYI 
Sbjct: 241 RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 300 

Query: 301 ARPKANGYQSIHTTWGPKGPIErQIRTKEMHQVAEFGVAAHWAYKKGITSKVMQAEQSV 360 

AAPKANGYQSIHTTVYGPKGPIEIQIRTKEMEQVAE+GVAAKWAYKKG+ KVNQAEQ V 
Sbjct: 301 AAPKANGYQSIHTTVYGPKGPIEIQIRTKEMHQVAEYGVAAHWAYKKGVRGKVNQAEQKV 360 

Query: 361 GMGWIQELVELQDESK-DAKDFVDSVKEDIFTERIYVFTPNGAVQELPRESGPIDFAYAI 419 

GM WI+ELVELQD S DA DFVDSVKEDI F+ERIYVFTP GAVQELP++SGPIDFAYAI 
Sbjct: 361 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPKDSGPIDFAYAI 420 

Query: 420 HTQVGEKATGAI<VNGRMVPLTAKLKTGDWEIITNPNSFGPSRDWIKIVKTNKARNKIRQ 479 

HTQVGEKA GAKVNGRMVPLTAKLKTGDWEI+TNPNSFGPSRDWIK+VKTNKARNKIRQ 
Sbjct: 421 HTQVGEKAIGAI<VNGRMVPLTAKLKTGDWEIVTNPNSFGPSRDWIKLVKTNKARNKIRQ 480 

Query: 480 FFKNQDKETS INKGRELLVDYFQEQGYVPNKYLDKKHIEE ILPRVSVKSEEALYAAVGFG 539 

FFKNQDKE S+NKGR++LV YFQEQGYV NKYLDKK IE ILP+VSVKSEE+LYAAVGFG 
Sbjct: 481 FFKNQDKELSVNKGRDMLVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 540 

Query: 540 DLSPISIFNKLTEICERREEERAKAKAEADELINGGEIKTDKRDVLKVKSENGVIIQGASG 599 
D+SP+S+FNKLTEKERREEERAKAKAEA+EL+NGGEIK + +DVLKV+SENGVI IQGASG 
' Sbjct: 541 DISPVSVFNKLTEKERREEERAKAKAEAEELWGGEIKHENKDVLKVRSENGVIIQGASG 600 

Query: 600 LLMRIAKCCNPVPGDLIEGYITKGRGVAIHRSDCQNLKSQENYEQRLIDVEWDDDGSKKE 659 

LLMRIAKCCNPVPGD IEGYITKGRG+AIHR+DC N+KSQ+ Y++RLI+VEWD D S K+ 
Sbjct: 601 I 



Query: 720 WDKIKIIPDVYSVKRTNG 738 

W+KIK + PDVYS VKRTNG 
Sbjct: 721 WEKIKAVPDVYSVKRTNG 739 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1603> which encodes the amino acid 
sequence <SEQ ID 1604>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 64 - 80 ( 64 - 80) 



55 Final Results 

bacterial membrane --- Certainty=0. 1128 (Affirmative) < succi 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

60 The protein has homology with the following sequences in the databases: 

>GP:CAA51353 GB:X72832 stringent response-like protein 
[Streptococcus equisimilis] 
Identities = 700/739 (94%), Positives = 721/739 (96%) 



65 Query: 5 MAKIMNOTGEEVIALAATYMTKADVAF^/AKALAYATAAHFYQVRKSGEPYIVHPIQVAGI 64 
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Query: 65 IiADLHLDAVTVACGFLHDWEDTDITLDEIEAEFGHDARDIVDGVTKLGEVEYKSHEEQL 124 

LADLHLDAVTVACGFLHDVVEDTDITIiD IE DFG D RDIVDGVTKLG+VEYKSHEEQL 
Sbjct: 61 LADLHLDAVTVACGFLHDWEDTDITLDNIEFDFGKDVRDIVDGVTKLGKVEYKSHEEQL 120 



Query: 125 i 

AEIfflRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPLAHRLGI 
Sbjct: 121 AENHRKMIjMAMSKDIRVILVICLADRLHNMJTLKHIiRKDK^ 180 

Query: 185 SRI KWELEDLAFRYLNETEFYKI SHMMKEKRREREALVEAI VSKVKTYTTQQGLFGDVYG 244 

SRI KWELEDLAFRYLNETEFYKI SHMM EKRREREALV+ IV+K+K+YTT+QGLFGDVYG 
Sbjct: 181 SRI KWELEDLAFRYLNETEFYKI SHMMNEKRREREALVDD I VTKI KS YTTEQGLFGDVYG 240 

Query: 245 RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 304 

RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 
Sbjct: 241 RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYA^IVGYIHELWRPMPGRFKDYI 300 

Query: 305 AAPKANGYQSIHTTVYGPKGPIEIQIRTKDMHQVAEYGVAAHWAYKKGVRGKVNQAEQAV 364 

AAPKANGYQSIHTTOTGPKGPIEIQIRTK+MHQVAEYGVAAHWAYKKGVRGKVNQAEQ V 
Sbjct: 301 AAPKANGYQSIHTTVYGPKGPIEIQIRTKEMHQVAEYGVAAHWAYKKGVRGKVNQAEQKV 360 

Query: 365 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPKESGPIDFAYAI 424 

GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPK+SGPIDFAYAI 
Sbjct: 361 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIYVFTPTGAVQELPKDSGPIDFAYAI 420 

Query: 425 HTQIGEI<ATGAI^GRWPLTAKLKTGDVVEIITNANSFGPSRDWKLVKTNKARNKIRQ 4 84 

HTQ+GEKA GAKVNGRMVPL.TAKLKTGDWEI+TN NSFGPSRDW+KLVKTNKARNKIRQ 
Sbjct: 421 HTQVGEKAIGAKVNGRMVPLTAKLKTGDVVEIVTNPNSFGPSRDWIKLVKTNKARNKIRQ 480 

Query: 4B5 FFICNQDKELSTOKGRDLLVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 544 

FFKNQDKELSVNKGRD+LVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 
Sbjct: 481 FFKNQDKELSVMKGRDMLVSYFQEQGYVANICYLDKKRIEAILPICVSVKSEESLYAAVGFG 540 

Query: 545 D I S PI S VFNKLTEKERREEERAKAICAEAEELWGGEVKHENKDVLKVRSENGVI IQGASG 604 

DISP+SVFNKLTEKERREEERAKAKAEAEELV GGE+KHENKDVLKVRSENGVI IQGASG 
Sbjct: 541 DISPVSVFNKLTEE^RREEERAKAICAEAEELVNGGEIKHENKDVLKVRSENGVIIQGASG 600 

Query: 605 LLMRIAKCCNPVPGDPIDGYITKGRGIAIHRSDCHNIKSQDGYQERLIEVEWDLDNSSKD 664 

LLMRIAKCCNPVPGDPI+GYITKGRGIAIHR+DC+NIKSQDGYQERLIEVEWDLDNSSKD 
Sbjct: 601 LLMRIAKCCNPVPGDPIEGYITKGRGIAIHRADCNNIKSQDGYQERLIEVEWDLDNSSKD 660 

Query: 665 YQAEIDIYGLNRSGLLNDVLQILSNSTKSISTVNAQPTKDMKFANIHVSFGIPNLTHLTT 724 

YQAEIDIYGLNR GLLNDVLQILSNSTKSISTVNAQPTKDMKFANIHVSFG1PNLTHLTT 
Sbjct: 661 YQAEIDIYGIjNRRGLLNDVLQILSKSTKSIST\'NAQPTKDMKFANIHVSFGIPNLTHLTT 720 

Query: 725 WEKI KAVPDVYSVKRTNG 743 

VVEKIKAVPDVYSVKRTNG 
Sbjct: 721 WEKIKAVPDVYSVKRTNG 739 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 635/739 (85%) , Positives = 691/739 (92%) , Gaps = 1/739 (0%) 

Query: 1 MVICEINLTGEEWAITSQYMSETDVAFVKFALNYATAAHYYQARKSGEPYIIHPIQVAGI 60 

M K +N+TGEEV+A+ + YM++ DVAFV AL YATAAH+YQ RKSGEPYI+HPIQVAGI 
Sbjct: 5 MAKIMNOTGEEVIALAATYMTKADVAFVAKALAYATAAHFYQVRKSGEPYIVHPIQVAGI 64 

Query: 61 LADLHLDAVTVACGFLHDVVEDTEITLDEIETD?GKDVRDIIDGVTI<LGKVEYKSHEEQL 120 

LADLHLDAVTVACGFLHDWEDT+ITLDEIE DFG D RDI+DGVTKLG+VEYKSHEEQL 
Sbjct: 65 LADLHLDAVTVACGFLHDVVEDTDITLDEIEADFGHDARDIVDGVTKLGEVEYKSHEEQL 124 



Sbjct: 125 AENHRKMLMAMSKDIRVILVKLADRLHNMRTLKHLRKDKQERISRETMEIYAPLAHRLGI 184 



Query: 181 SRIKWELEDLSFRYLNETEFYKISHMMSEKRREREEL\TJIIVDKIRSYTEEQGLYGDIYG 240 
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j+FRYLNETEFYKISHKM EKRRERE LV+ IV K+++YT +QGL+GD+YG 
Sbjct: 185 SRIKWELEDLAFRYLI^TEFYKISHMMKEKRREREALVEAIVSKVKTYTTQQGLFGDVYG 244 

Query: 241 RPKHIYSIYRKMRDKKKRFDQIYDLIAIRCIMETASDVYAMVGYIHELWRPMPGRFKDYI 300 

RPKHIYS I YRKMRDKKKRFDQI +DLIAIRC+MET SDVYAMVGYIHELWRPMPGRFKDYI 
Sbjct: 245 RPKHIYSIYRKMRDKKKRFDQIFDLIAIRCVMETQSDVYAMVGYIHELWRPMPGRFKDYI 304 



Query: 301 AAPKANGYQS IHTTVYGPKGPI E I QIRTKEMHQVAEFGVAAHWAYKKGI TSKUNQAEQSV 360 
AAPKANGYQSIHTTVYGPKGPIEIQIRTK+MHQVAE+GVAAHWAYKKG+ KVNQAEQ+V 
10 Sbjct: 305 AAPKANGYQSIHTTVYGPKGPIEIQIRTKDMHQVAEYGVAAHWAYKKGVRGKVMQAEQAV 364 



Query: 361 GMGWIQELWLQDESK-DAKDF\TD8VKEDIFTERIYVFT?NGAVQ3LPRESGPIDFAYAI 419 

GM WI+ELVELQD S DA DFVDSVKEDIF+ERIYVFTP GAVQELP+ESGPIDFAYAI 
Sbjct: 365 GMNWIKELVELQDASNGDAVDFVDSVKEDIFSERIY'/FTPTGAVQELPKESGPIDFAYAI 424 

Query: 420 HTQVGEKATGAKVNGRMTOLTAKLKTGDVVEIITNPNSFGPSRDWIKIVKTNKARNKIRQ 479 

HTQ+GEKATGAKVNGRMVPLTAKLKTGDWEI ITN NSFGPSRDW+K+VKTNKARNKIRQ 
Sbjct: 425 HTQIGEKATGAKWGRMVPLTAKLICrGDVVEIITNANSFGPSRDWVKLVKTNKARNKIRQ 484 

Query: 480 FFKNQDKETSINKGRELLVDYFQEQGYVPNKYLDKKHIEEILPRVSVKSEEALYAAVGFG 539 

FFKNQDKE S+NKGR+LLV YFQEQGYV NKYLDKK IE ILP+VSVKSEE+LYAAVGFG 
Sbjct: 485 FFKNQDKELSVNKGRDLLVSYFQEQGYVANKYLDKKRIEAILPKVSVKSEESLYAAVGFG 544 



Sbjct: 545 D I S PI S VFNKLTEKERREEERAKAKAEAEELVKGGEVKHENKDVLKVRSENGVI I QGASG 604 



Query: 


SOO 


LLMRIAKCCNPVPGDLIEGYITKGRGVAIHRSDCQNLKSQENYEQRLIDVEWDDDG 


SKKE 659 






LLMRIAKCCNPVPGD I+GYITKGRG+AIHRSDC N+KSQ+ Y++RLI+VEWD D 


S K+ 


Sbjct: 


S05 


LLMRIAKCCNPVPGDPIDGYITKGRGIAIHRSDOiNIKSQDGYQERLIEVEWDLDN 


SSKD 664 




660 


YMAE IDIYGIM^SGLLNDVLQTLSNATKLVSTVNAQPTKDMKFANIHVSFG ISNLA 


QLTT 719 






Y AEIDIYGLNRSGLLNDVLQ LSN+TK +STVNAQPTKDMKFANIHVSFGI NL 


LIT 


Sbjct: 


665 


YQAEIDIYGLNRSGLLNDVLQILSNSTKSISTWAQPTKDMKFANIHVSFGIPNLT 


HLTT 724 


Query: 


720 


WDKI KI I PDVYSVKRTNG 738 








W+KIK + PDVYSVKRTNG 




Sbjct: 


725 


WEKI KAVPDVYSVKRTNG 743 





40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 501 

A DNA sequence (GBSx0539) was identified in S.agalactiae <SEQ ID 1605> which encodes the amino 
acid sequence <SEQ ID 1606>. This protein is predicted to be 2',3'-cyclic-nucleotide 2'-phosphodiesterase 
precursor (cpdB). Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.79 Transmembrane 779 - 795 ( 778 - 797) 

Final Results 

bacterial membrane --- Certainty=0 . 3314 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



55 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12613 GB:Z99108 similar to 2 ' ,3 1 -cyclic-nucleotide 
2 1 -phosphodiesterase [Bacillus subtilis] 
Identities = 297/630 (47%), Positives = 419/630 (66%), Gaps = 21/630 (3%) 



60 Query: 102 KVDLRIMSTTDLHTNLVNYDYYQDKESQKIGIjAKTAVLIEEAKKENPNTVLVDNGDVIQG 161 

+V L I++TTD+H N+++YDYY DKE+ GLA+TA LI++ +++NPNT+LVDNGD+IQG 



WO 02/34771 PCT/GB01/04789 
-594- 

Sbjct: 42 QVHLSILATTDIHANl#nDYDYYSDKETMFGLfiETAQLIQKHREQNPNTLLVDNGDLIQG 101 

Query: 162 TPLGTYKAIVKP VAE^EHPMYQAMNMiGYDMTLGNHEEKYGLDYLKKIlATANLP 218 

PLG Y + ++ + HP+ MNAL YDA TLGNHEFKYGLD+L I A+ P 
Sbjct: 102 NPLGEYAVKYQKDDIISGTKTHPIISVMNMiKYDAGTLGNHEFNYGLDFLDGTIKGADFP 161 

Query: 219 IMA^DFKTHQPVFKTYDIITKTFKDSTGRAV^IGITGIVPPQIUTOKANLEGKV 278 

I+NANV 4-+ + YIKTDG ++GG VPPQI+ WDK NLEG+V 
Sbjct: 162 IVmi^T-TSGENRYTPYVINEKTLIDENGNEQKVKVGYlGFVPPQIMTWDKKNLEGQV 220 

Query: 279 IVKDSVKAIEEIVPTMRAKGADVILV1SHSC-IGDDRYEEGEEWVGYQIAS-IKGVDAVVT 337 

V+D V++ E +P M+A+GADVI+ L+H+GI G EN + +A+ KG+DA+++ 

Sbjct: 221 QVQDIVESANETIPKMKAEGADVIIALAHTGIEKQAQSSGAENAVFDLATKTKGIDAIIS 280 

Query: 338 GHSHAEFPSGNGTGFYEKYTGVDGHJ GKINGTPVTMAGKYGDHLGIIDLGLSYTNGK 394 

GH H FPS +Y GV N G ING PV M +G +LG+IDL L +G 

Sbjct: 281 GHQHGLFPSA- - EYAGVAQEWEKGTINGIPWMPSSWGKYLGVIDLKLEKADGS 333 

Query: 395 WQVSESSAKIRKIDMNSTTADERIIAIAKEAHDGTIWYVRQQVGTrrAPITSYFALVKDD 454 

W+V+4S I I N +E + ++ H T+ YVR+ VG T A I S+FA VKDD 
Sbjct: 334 WKVADSKGSIESIAGNVTSRMETVTCTIQQTHQNTIjEYVRKPVGKTEADINSFFAQVKDD 393 

Query: 455 PSVQIVNNAQRWYVANELKGTPEANLPLLSAAAPFKAGTRGDATAYTDI PAGPVAI KNVA 514 

PS+QIV +AQ+WY E+K T NLP+LSA APFKAG R A YT+IPAG +AIKNV 
Sbjct: 394 PSIQIVTDAQKWYAEKEMKDTEYKNLPILSAGAPFKAGGRNGAMYYTNIPAGDIAIKNVG 453 

Query: 515 DLYLYDNVTALLKVTGADLREWLEMSAGQFWQIDPNNraPQNIINTEYRTYNFDVIDGLT 574 

DLYLYDN ++K+TG+++++WLEMSAGQFNQIDP Q ++N +R+YNFDVIDG+T 

Sbjct: 454 DLYLYDNWQIVKLTGSEVKDWLEMSAGQFNQIDPAKGGDQALIJIENFRSYNFDVIDGVT 513 

Query: 575 YKFDITQPNKTYNKDGKVWSQASRVRDLMYNGKPVADKQEFMIVTNNYRASGTFPGAKNA 634 

Y+ D+T+P KYN++GKV+N+ +SR+ +L Y GKP++ QEF++VTNNYRASG G + 
Sbjct: 514 YQVDVTKPAKTiTSIENGKVINADSSRIIMJSYEGKPISPSQEFLVVTNNYRASGG-GGFPHL 572 

Query: 635 TMNRLLN LENRQTI INYI I SEKTINPTADNNWGFTESIKDLDLRFQTADKAICNLVTN 691 

T +++++ +ENRQ +++YII +KT+NP ADNNW + +L F+++ AK 

Sbjct: 573 TSDKIVHGSAVE^QVLIWYIIEQKTVNPKADNNWSIA-PVSGTNLTFESSLLAKPFADK 631 

Query: 692 SKDIQYIASSTKDEGFGDYRFVYTEQEKVD 721 

+ D+ Y+ S +EG+G Y+ + + D 
Sbjct: 632 ADDVAYVGKSA-NEGYGVYKLQFDDDSNPD 660 
Identities = 133/567 (23%) , Positives = 214/567 (37%) , Gaps = 147/567 (25%) 





104 


Sbjct: 


668 




164 


Sbjct: 


714 




217 


Sbjct: 


766 




263 


Sbjct: 


824 




313 


Sbjct: 


875 




371 


Sbjct: 


913 



LGTYKAI VKPVAENEEHPMYQAMNALGYDASTLGNHEFNYG LDYLKKI IATAN- - - 2 1 6 

Y +A+ + MN +GYDA T GNHEP+ G D+L AT + 

- -LYFTKWNGLAD LKJMIWIGYDA^TFGNHEFDKGPTVLSDFLSGNSATVDPAN 765 

LPIliNANVLDFKTHQPVFKTYDIITKTF KDSTGRAVALNIGITG- -IV 262 

PI++AWV +++P K++ +TF KG + + + G + 



N ++A +KG+D ++ GH+H T VD + N P 

--HNRDLELAKKVKGIDLIIGGHTH TLVDKMEWNHEEPT ! 



V A +YG LG +D+ 



Query: 430 INYV RQQVGTTTAPITSYFALVKDDPSv^IVNNAQRWYVANELKGTPEANLPLLSA 485 



WO 02/34771 



PCT/GB01/04789 



-595- 



DVALDGQREHVRTKKTWLGNFIADGMLA 1009 

Query: 486 AAPFKAGTRGDAT AYTDIPAGPVAlKNVaDLYLYDMVTALLKVTGADLREWLEMSA 541 

A AG R T I G + + V ++ + N + +TG ++E LE 
Sbjct: 1010 KAKEAAGARIAITNGGGIRAGIDKGDITLGEVLNVMPFGNTLYVADLTGKQI KEALE 1066 

Query: 542 GQFNQIDPNNKAPQNIim'EYRTYNFDVlDGLTYKFDITQPNKyNKDGKVVNSQASRVRD 601 

Q + NE F+G+YF+ NKG + V+ 

Sbjct: 1067 QGLSNVENGGGAFPQVAGIEYTFTLN NKPG HRVLEVKI 1104 

Query: 602 LMYNGKPVADKQE- - FMIVTNNYRASG 626 

NG VA + + + TNN+ +G 
Sbjct: 1105 ESPNGDKVAINTDDTYRVATHNFVGAG 1131 

There is also homology to SEQ ID 1608. A related sequence was also identified in GAS <SEQ ID 9129> 
which encodes the amino acid sequence <SEQ ID 9130>. Analysis of this protein sequence reveals the 
following: 

PoGsible cleavage Gite: 27 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.67 Transmembrane 649 - 665 ( 648 - 666) 
INTEGRAL Likelihood = -2.02 Transmembrane 6 - 22 ( 5-22) 
peripheral Likelihood = 1.85 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8585> and protein <SEQ ID 8586> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 6.68 
GvH: Signal Score (-7.5): 0.87 

Possible site: 28 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -5.79 threshold: 0.0 

INTEGRAL Likelihood = -5.79 Transmembrane 779 - 795 ( 778 - 797) 
PERIPHERAL Likelihood = 0.53 251 
modified ALOM score: 1.66 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 3314 (Affirmative) <- suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



LPXTG motif: 769-773 



The protein has homology with the following sequences in the databases: 

50 ORF01378(298 - 2337 of 3000) 

GP| 6782402 1 emb| CAB70615 . l| |AJ133440(1 - 680 of 683) cyclo-nucleotide phosphodiesterase, 
putative {Strept 

ococcus dysgalactiae subsp. eguisimilis} 
%Match =38.3 
55 %Identity =59.0 %Similarity =72.3 

Matches = 403 Mismatches = 181 Conservative Sub.s = 91 



LFYHFLT*K*KKLEAQKELXTK*MCLTKLSFINKRLFLV*SLKIIRK*D*LNVFNI<L**FL *DNIHVMF*WRRFMSKHY 

1 = 1 I 
MMTKGY 



WO 02/34771 



-596- 



PCT/GB01/04789 



FSKSVFALTVLTATATSGL&VQAEDIVTTPSSTSTKVESTTPTSTIAEEKSNVTSTPTAITI^STATSTNyXTO^NPQP 

III Ml =: I :||: hi =11 h II I I I I if 7 : = 

MSKSAIFI^LVAAGSAQLT-KAEETTAVEPLTTT-ANTTTSTAVPAETAPLVADTTPATATADTAVPSPVNSTSSE-MA 



VATEATTSDLKPIEGEKVDLRIMSTTDLHTNLVNYDYYQD^ 

h ■■ hlh Ihlhlllllhlllllllllllhl MIMIMIh Mill I I I I I I I I - I I I I I 

TASADNATVTAPVEGQSVDWILSTTDLHSNLVl^ 



825 855 885 915 945 975 1005 1035 

TYKAIVKPVAElTOEHPMYQA^ALGYDASTLGmEF^ 

llllll II =1 Mil : II MIIIIIIIIIIIMIII - I I I I I I I II =1 II =1111111 

TYKAIVDPVEADEWMYAALKAI^FDASTLGMffi 



1055 1095 1125 1155 1185 1215 1243 1273 

DSTGRAVALNIGITGIVPPQILNWDKA1^EGKVIVKDSV?CAISEIVPTMRAKGADVILVLS TIALEMIDMKKVKKTLAI 

I h hi llllhllllhMIIIII III ||hhh = h:|hll 111-111= II I II III I 
DKDGKOTSLKIGITGWPPQIMSWDKANLTGKOTVKDAVE^^ 

260 270 280 290 300 310 320 

1303 1357 1387 1416 1446 1476 1500 

KLPASREWMPLtiRDTHTL- -NFHQVTVLASMKNTLELMVSM KINGTPVTMAGKYGDHLGIIDLGLSVTNGKWQVS--ES 

II I M Ml >h h I : == III 1 1 1 1 lllllllhlll hlllhhh =1 

KLLALKVLMQWSOAIHML I SQPYQMAVFT ITSKVLMVKRAL ! - INGVPVTMGGKYGDHLGLIDLNLTYTNGQVIKV11KDQS 
340 350 360 370 380 390 400 

1530 1560 1590 1620 1650 1680 1710 1740 

SAKIRKIDMNSTTADERIIALAKEAHDGTIimRQQVGlTTAPITSYFALVKDDPSVOIVXNAORVm/ANELKGTPEANL 

h I'll i i iimmiiih iiiimiinii mihiiiihiii linn n imimi 

RAETRQID8KSNQVDPTI IALAKEAHDGTVAYVRQQVGTTTAPINSYFAL I KDDPS I QI VNNAQRWYAEKELAGTPEANL 



1770 1800 1830 1860 1887 

PLLSAAAPFKAGTRGDATAYTDIPAGPVAIKKVADLYLYDNVTAL-L 
IMMMMMI I := := : = I II II I II h I I I II I M I II lh = M 

PLLSAAAPFKAGYTKMMRQLILIFLLVQSLSKMSLTFTCTTTSIjLFLKVTGADLKEWLEMSAGQFNTIDPSKSEPQDLVW 



2007 2037 2067 2097 2127 2157 2187 2217 

TEYRTYNFDVIDGLTYKFDITQPNKYNKDGKVVNSQASRV^ 

I IIIIIIIIIIMlhlhll llh I Ml MIMM I II : MMIMMMIMI III II : I I I 



50 2247 2277 2307 2337 2367 2397 2427 2457 

LNLENRQTIIirailSEKTINPTADNIWGFTESIKDLDLRFQTA 

IMIMI MllhMIMIhlMM I =MI hill 

LNLENRQAIINY1VAEKTIKPSADNNWYFADTIKGLNLRFLKR 
650 660 670 680 

55 

SEQ ID 8586 (GBS53) was expressed in E.coli as a His-fusion product. The purified protein is shown in 
Figure 196, lane 9. 

Example 502 

A DNA sequence (GBSx0540) was identified in S.agalactiae <SEQ ID 1609> which encodes the amino 
60 acid sequence <SEQ ID 1610>. Analysis of this protein sequence reveals the following: 

3 N-terminal cignal sequence 



WO 02/34771 



PCT/GB01/04789 



Final Results 

bacterial cytoplasm — Certainty=0. 0296 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 



Example 503 

A DNA sequence (GBSx0541) was identified in S.agalactiae <SEQ ID 161 1> which encodes the amino 
acid sequence <SEQ ID 1612>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1504 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10195> which encodes amino acid sequence <SEQ ID 
10196> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



S+ T+ IK L SIPSPTG T 



RM+TAH+DTLGAMV+ IK DGRLKI DLIGG+ YN+IEGE C I + +GK +GT L+HQ 



TSVHVYKDAG AERNQ NMEIRLDE V +T LGI VGDF+SFDPR IT SGFIKS 



Query: 


30 


Sbjct: 


3 




90 




63 




150 


Sb j ct : 




Query: 


210 


Sbjct: 


182 


Query: 


270 


Sbjct: 


242 




330 


Sb j ct : 


302 



R+LDDK S +L+ L+ + EDI+LPYTTHF S EE+G+G NS+IP ETVEYLAVDM 



GA+GD Q TDEY+VS I CVKDASGPYHY+LR+HLV LAE ++I YKLDIYPYYGSDASAA+ 



++G ++ H L+G GI++SH++ERTH S++ T L+ 



There is also homology to SEQ ID 424. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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-598- 

Example 504 

A DNA sequence (GBSx0542) was identified in S.agalactiae <SEQ ID 1613> which encodes the amino 
acid sequence <SEQ ID 1614>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3157 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 5 LIIIRGNSASGKSTIAKQLQAELGENTLLLSQDYLRREMLGTKDGENTTTIPLLINLLNY 64 

LI++RGNS SGKS++A+ L+ G + QDYLRR +L D I L+ + Y 

Sbjct: 23 LIVLRGNSGSGKSSVARALRERFGYGLAWVEQDYLRRVLLREHDVAGGKNIGLIETNVRY 82 

Query: 65 GYHNCSYIILEGILRSDWYTPWKHILKHNPNNTYAYYYDLSFQETVKRHSTRLKSLEFG 124 

S +LEGIL S Y P+ + + H + +Y+DL F+ETV+RH+TR ++ +FG 

Sbjct: 83 CLSAGSVTVLEGILFSRHYGPMLERL--HADFGGHWFYFDIiPFEETVRRHATRPQAADFG 140 

Query: 125 EDSIARWWLEKDFLKEIPEKILTKAMSLED 154 

4- W-t- +D 3j + E+++ A SL D 
Sbjct: 141 VQDMQAWFQARDVLPFVQEQLIGPASSLAD 170 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 505 

A DNA sequence (GBSx0543) was identified in S.agalactiae <SEQ ID 1615> which encodes the amino 
acid sequence <SEQ ID 1616>. This protein is predicted to be periplasmic-iron-binding protein BitC. 
Analysis of this protein sequence reveals the following: 

35 Possible site: 29 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.46 Transmembrane 9 - 25 ( 5-30) 



Final Results 

40 bacterial membrane 

. bacterial outside 
bacterial cytoplasm 



■-- Certainty=0. 5585 (Affirmative) < suco 
■ — Certainty=0. 0000 (Not Clear) < suco 
•-- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:AAD18094 GB:U75349 periplasmic- iron-binding protein BitA 

[Brachyspira hyodysenteriae] (ver 2) 
Identities = 114/331 (34%), Positives = 184/331 (55%), Gaps = 3/331 (0%) 

Query: 11 YILLWSIIFISVFTYSISQPSKLLPPKELVILSPNSQAILTGTIPAFEEKY-GIKVKLI 69 
50 +1+ + ++ +++F S SK LVI + ++ + F+ K I V+++ 

Sbjct: 4 FI IFCMLMLSMTLFYSCSSGDSK- -NANSLVIYCSHPLDLMNTILDDFKAKNPDINVEW 61 

Query: 70 C^GTGQLIDRLSKEGKQLKADIFFGGNYTQFESHKALFESYVSKNVHTVIPDYIHPSDTA 129 
GTG+L+ R+ E D+ +GG +S LFE+Y S N ++ ++ + 

55 Sbjct: 62 TAGTGELLKRVEAEKMNPLGDVLWGGTIuNSVKSKTDLFENYTSTNFANIIXiEFKNTEGPF 121 



